Paper Group AWR 104
The Kinetics Human Action Video Dataset. Adversarial Variational Optimization of Non-Differentiable Simulators. EmoAtt at EmoInt-2017: Inner attention sentence embedding for Emotion Intensity. On better training the infinite restricted Boltzmann machines. Rainbow: Combining Improvements in Deep Reinforcement Learning. Learning Spatio-Temporal Repre …
The Kinetics Human Action Video Dataset
Title | The Kinetics Human Action Video Dataset |
Authors | Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman |
Abstract | We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some baseline performance figures for neural network architectures trained and tested for human action classification on this dataset. We also carry out a preliminary analysis of whether imbalance in the dataset leads to bias in the classifiers. |
Tasks | Action Classification, Human-Object Interaction Detection |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.06950v1 |
http://arxiv.org/pdf/1705.06950v1.pdf | |
PWC | https://paperswithcode.com/paper/the-kinetics-human-action-video-dataset |
Repo | https://github.com/OanaIgnat/i3d_keras |
Framework | tf |
Adversarial Variational Optimization of Non-Differentiable Simulators
Title | Adversarial Variational Optimization of Non-Differentiable Simulators |
Authors | Gilles Louppe, Joeri Hermans, Kyle Cranmer |
Abstract | Complex computer simulators are increasingly used across fields of science as generative models tying parameters of an underlying theory to experimental observations. Inference in this setup is often difficult, as simulators rarely admit a tractable density or likelihood function. We introduce Adversarial Variational Optimization (AVO), a likelihood-free inference algorithm for fitting a non-differentiable generative model incorporating ideas from generative adversarial networks, variational optimization and empirical Bayes. We adapt the training procedure of generative adversarial networks by replacing the differentiable generative network with a domain-specific simulator. We solve the resulting non-differentiable minimax problem by minimizing variational upper bounds of the two adversarial objectives. Effectively, the procedure results in learning a proposal distribution over simulator parameters, such that the JS divergence between the marginal distribution of the synthetic data and the empirical distribution of observed data is minimized. We evaluate and compare the method with simulators producing both discrete and continuous data. |
Tasks | |
Published | 2017-07-22 |
URL | http://arxiv.org/abs/1707.07113v4 |
http://arxiv.org/pdf/1707.07113v4.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-variational-optimization-of-non |
Repo | https://github.com/neychev/adversarial_variational_optimization |
Framework | pytorch |
EmoAtt at EmoInt-2017: Inner attention sentence embedding for Emotion Intensity
Title | EmoAtt at EmoInt-2017: Inner attention sentence embedding for Emotion Intensity |
Authors | Edison Marrese-Taylor, Yutaka Matsuo |
Abstract | In this paper we describe a deep learning system that has been designed and built for the WASSA 2017 Emotion Intensity Shared Task. We introduce a representation learning approach based on inner attention on top of an RNN. Results show that our model offers good capabilities and is able to successfully identify emotion-bearing words to predict intensity without leveraging on lexicons, obtaining the 13th place among 22 shared task competitors. |
Tasks | Representation Learning, Sentence Embedding |
Published | 2017-08-18 |
URL | http://arxiv.org/abs/1708.05521v1 |
http://arxiv.org/pdf/1708.05521v1.pdf | |
PWC | https://paperswithcode.com/paper/emoatt-at-emoint-2017-inner-attention |
Repo | https://github.com/epochx/emoatt |
Framework | tf |
On better training the infinite restricted Boltzmann machines
Title | On better training the infinite restricted Boltzmann machines |
Authors | Xuan Peng, Xunzhang Gao, Xiang Li |
Abstract | The infinite restricted Boltzmann machine (iRBM) is an extension of the classic RBM. It enjoys a good property of automatically deciding the size of the hidden layer according to specific training data. With sufficient training, the iRBM can achieve a competitive performance with that of the classic RBM. However, the convergence of learning the iRBM is slow, due to the fact that the iRBM is sensitive to the ordering of its hidden units, the learned filters change slowly from the left-most hidden unit to right. To break this dependency between neighboring hidden units and speed up the convergence of training, a novel training strategy is proposed. The key idea of the proposed training strategy is randomly regrouping the hidden units before each gradient descent step. Potentially, a mixing of infinite many iRBMs with different permutations of the hidden units can be achieved by this learning method, which has a similar effect of preventing the model from over-fitting as the dropout. The original iRBM is also modified to be capable of carrying out discriminative training. To evaluate the impact of our method on convergence speed of learning and the model’s generalization ability, several experiments have been performed on the binarized MNIST and CalTech101 Silhouettes datasets. Experimental results indicate that the proposed training strategy can greatly accelerate learning and enhance generalization ability of iRBMs. |
Tasks | |
Published | 2017-09-11 |
URL | http://arxiv.org/abs/1709.03239v2 |
http://arxiv.org/pdf/1709.03239v2.pdf | |
PWC | https://paperswithcode.com/paper/on-better-training-the-infinite-restricted |
Repo | https://github.com/Boltzxuann/RP-iRBM |
Framework | none |
Rainbow: Combining Improvements in Deep Reinforcement Learning
Title | Rainbow: Combining Improvements in Deep Reinforcement Learning |
Authors | Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver |
Abstract | The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance. |
Tasks | |
Published | 2017-10-06 |
URL | http://arxiv.org/abs/1710.02298v1 |
http://arxiv.org/pdf/1710.02298v1.pdf | |
PWC | https://paperswithcode.com/paper/rainbow-combining-improvements-in-deep |
Repo | https://github.com/mingyip/Rainbow-Reinforcement-Learning-For-Chess-Games |
Framework | none |
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Title | Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks |
Authors | Zhaofan Qiu, Ting Yao, Tao Mei |
Abstract | Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time. Furthermore, we propose a new architecture, named Pseudo-3D Residual Net (P3D ResNet), that exploits all the variants of blocks but composes each in different placement of ResNet, following the philosophy that enhancing structural diversity with going deep could improve the power of neural networks. Our P3D ResNet achieves clear improvements on Sports-1M video classification dataset against 3D CNN and frame-based 2D CNN by 5.3% and 1.8%, respectively. We further examine the generalization performance of video representation produced by our pre-trained P3D ResNet on five different benchmarks and three different tasks, demonstrating superior performances over several state-of-the-art techniques. |
Tasks | Action Recognition In Videos, Video Classification |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10305v1 |
http://arxiv.org/pdf/1711.10305v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-spatio-temporal-representation-with |
Repo | https://github.com/ZhaofanQiu/pseudo-3d-residual-networks |
Framework | pytorch |
Prediction Under Uncertainty with Error-Encoding Networks
Title | Prediction Under Uncertainty with Error-Encoding Networks |
Authors | Mikael Henaff, Junbo Zhao, Yann LeCun |
Abstract | In this work we introduce a new framework for performing temporal predictions in the presence of uncertainty. It is based on a simple idea of disentangling components of the future state which are predictable from those which are inherently unpredictable, and encoding the unpredictable components into a low-dimensional latent variable which is fed into a forward model. Our method uses a supervised training objective which is fast and easy to train. We evaluate it in the context of video prediction on multiple datasets and show that it is able to consistently generate diverse predictions without the need for alternating minimization over a latent space or adversarial training. |
Tasks | Video Prediction |
Published | 2017-11-14 |
URL | http://arxiv.org/abs/1711.04994v3 |
http://arxiv.org/pdf/1711.04994v3.pdf | |
PWC | https://paperswithcode.com/paper/prediction-under-uncertainty-with-error |
Repo | https://github.com/21lva/EEN_acrobot |
Framework | tf |
SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling
Title | SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling |
Authors | Ana Marasović, Anette Frank |
Abstract | For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question “Who expressed what kind of sentiment towards what?". Recent neural approaches do not outperform the state-of-the-art feature-based models for Opinion Role Labeling (ORL). We suspect this is due to the scarcity of labeled training data and address this issue using different multi-task learning (MTL) techniques with a related task which has substantially more data, i.e. Semantic Role Labeling (SRL). We show that two MTL models improve significantly over the single-task model for labeling of both holders and targets, on the development and the test sets. We found that the vanilla MTL model which makes predictions using only shared ORL and SRL features, performs the best. With deeper analysis we determine what works and what might be done to make further improvements for ORL. |
Tasks | Fine-Grained Opinion Analysis, Multi-Task Learning |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00768v3 |
http://arxiv.org/pdf/1711.00768v3.pdf | |
PWC | https://paperswithcode.com/paper/srl4orl-improving-opinion-role-labeling-using |
Repo | https://github.com/amarasovic/naacl-mpqa-srl4orl |
Framework | tf |
EMNIST: an extension of MNIST to handwritten letters
Title | EMNIST: an extension of MNIST to handwritten letters |
Authors | Gregory Cohen, Saeed Afshar, Jonathan Tapson, André van Schaik |
Abstract | The MNIST dataset has become a standard benchmark for learning, classification and computer vision systems. Contributing to its widespread adoption are the understandable and intuitive nature of the task, its relatively small size and storage requirements and the accessibility and ease-of-use of the database itself. The MNIST database was derived from a larger dataset known as the NIST Special Database 19 which contains digits, uppercase and lowercase handwritten letters. This paper introduces a variant of the full NIST dataset, which we have called Extended MNIST (EMNIST), which follows the same conversion paradigm used to create the MNIST dataset. The result is a set of datasets that constitute a more challenging classification tasks involving letters and digits, and that shares the same image structure and parameters as the original MNIST task, allowing for direct compatibility with all existing classifiers and systems. Benchmark results are presented along with a validation of the conversion process through the comparison of the classification results on converted NIST digits and the MNIST digits. |
Tasks | |
Published | 2017-02-17 |
URL | http://arxiv.org/abs/1702.05373v2 |
http://arxiv.org/pdf/1702.05373v2.pdf | |
PWC | https://paperswithcode.com/paper/emnist-an-extension-of-mnist-to-handwritten |
Repo | https://github.com/mohitiitb/NeuralNetworkVerification_GlobalRobustness |
Framework | tf |
Unsupervised Generative Modeling Using Matrix Product States
Title | Unsupervised Generative Modeling Using Matrix Product States |
Authors | Zhao-Yu Han, Jun Wang, Heng Fan, Lei Wang, Pan Zhang |
Abstract | Generative modeling, which learns joint probability distribution from data and generates samples according to it, is an important task in machine learning and artificial intelligence. Inspired by probabilistic interpretation of quantum physics, we propose a generative model using matrix product states, which is a tensor network originally proposed for describing (particularly one-dimensional) entangled quantum states. Our model enjoys efficient learning analogous to the density matrix renormalization group method, which allows dynamically adjusting dimensions of the tensors and offers an efficient direct sampling approach for generative tasks. We apply our method to generative modeling of several standard datasets including the Bars and Stripes, random binary patterns and the MNIST handwritten digits to illustrate the abilities, features and drawbacks of our model over popular generative models such as Hopfield model, Boltzmann machines and generative adversarial networks. Our work sheds light on many interesting directions of future exploration on the development of quantum-inspired algorithms for unsupervised machine learning, which are promisingly possible to be realized on quantum devices. |
Tasks | |
Published | 2017-09-06 |
URL | http://arxiv.org/abs/1709.01662v3 |
http://arxiv.org/pdf/1709.01662v3.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-generative-modeling-using-matrix |
Repo | https://github.com/congzlwag/UnsupGenModbyMPS |
Framework | none |
SUPRA: Open Source Software Defined Ultrasound Processing for Real-Time Applications
Title | SUPRA: Open Source Software Defined Ultrasound Processing for Real-Time Applications |
Authors | Rüdiger Göbl, Nassir Navab, Christoph Hennersperger |
Abstract | Research in ultrasound imaging is limited in reproducibility by two factors: First, many existing ultrasound pipelines are protected by intellectual property, rendering exchange of code difficult. Second, most pipelines are implemented in special hardware, resulting in limited flexibility of implemented processing steps on such platforms. Methods: With SUPRA we propose an open-source pipeline for fully Software Defined Ultrasound Processing for Real-time Applications to alleviate these problems. Covering all steps from beamforming to output of B-mode images, SUPRA can help improve the reproducibility of results and make modifications to the image acquisition mode accessible to the research community. We evaluate the pipeline qualitatively, quantitatively, and regarding its run-time. Results: The pipeline shows image quality comparable to a clinical system and backed by point-spread function measurements a comparable resolution. Including all processing stages of a usual ultrasound pipeline, the run-time analysis shows that it can be executed in 2D and 3D on consumer GPUs in real-time. Conclusions: Our software ultrasound pipeline opens up the research in image acquisition. Given access to ultrasound data from early stages (raw channel data, radiofrequency data) it simplifies the development in imaging. Furthermore, it tackles the reproducibility of research results, as code can be shared easily and even be executed without dedicated ultrasound hardware. |
Tasks | |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.06127v3 |
http://arxiv.org/pdf/1711.06127v3.pdf | |
PWC | https://paperswithcode.com/paper/supra-open-source-software-defined-ultrasound |
Repo | https://github.com/IFL-CAMP/supra |
Framework | pytorch |
Detecting Cancer Metastases on Gigapixel Pathology Images
Title | Detecting Cancer Metastases on Gigapixel Pathology Images |
Authors | Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E. Dahl, Timo Kohlberger, Aleksey Boyko, Subhashini Venugopalan, Aleksei Timofeev, Philip Q. Nelson, Greg S. Corrado, Jason D. Hipp, Lily Peng, Martin C. Stumpe |
Abstract | Each year, the treatment decisions for more than 230,000 breast cancer patients in the U.S. hinge on whether the cancer has metastasized away from the breast. Metastasis detection is currently performed by pathologists reviewing large expanses of biological tissues. This process is labor intensive and error-prone. We present a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x 100,000 pixels. Our method leverages a convolutional neural network (CNN) architecture and obtains state-of-the-art results on the Camelyon16 dataset in the challenging lesion-level tumor detection task. At 8 false positives per image, we detect 92.4% of the tumors, relative to 82.7% by the previous best automated approach. For comparison, a human pathologist attempting exhaustive search achieved 73.2% sensitivity. We achieve image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides. In addition, we discover that two slides in the Camelyon16 training set were erroneously labeled normal. Our approach could considerably reduce false negative rates in metastasis detection. |
Tasks | Medical Object Detection |
Published | 2017-03-03 |
URL | http://arxiv.org/abs/1703.02442v2 |
http://arxiv.org/pdf/1703.02442v2.pdf | |
PWC | https://paperswithcode.com/paper/detecting-cancer-metastases-on-gigapixel |
Repo | https://github.com/Reemr/Cancer-detection |
Framework | none |
Learning from Synthetic Humans
Title | Learning from Synthetic Humans |
Authors | Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, Cordelia Schmid |
Abstract | Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data. |
Tasks | Human Part Segmentation, Motion Capture, Pose Estimation |
Published | 2017-01-05 |
URL | http://arxiv.org/abs/1701.01370v3 |
http://arxiv.org/pdf/1701.01370v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-synthetic-humans |
Repo | https://github.com/gulvarol/surreal |
Framework | torch |
Projection Based Weight Normalization for Deep Neural Networks
Title | Projection Based Weight Normalization for Deep Neural Networks |
Authors | Lei Huang, Xianglong Liu, Bo Lang, Bo Li |
Abstract | Optimizing deep neural networks (DNNs) often suffers from the ill-conditioned problem. We observe that the scaling-based weight space symmetry property in rectified nonlinear network will cause this negative effect. Therefore, we propose to constrain the incoming weights of each neuron to be unit-norm, which is formulated as an optimization problem over Oblique manifold. A simple yet efficient method referred to as projection based weight normalization (PBWN) is also developed to solve this problem. PBWN executes standard gradient updates, followed by projecting the updated weight back to Oblique manifold. This proposed method has the property of regularization and collaborates well with the commonly used batch normalization technique. We conduct comprehensive experiments on several widely-used image datasets including CIFAR-10, CIFAR-100, SVHN and ImageNet for supervised learning over the state-of-the-art convolutional neural networks, such as Inception, VGG and residual networks. The results show that our method is able to improve the performance of DNNs with different architectures consistently. We also apply our method to Ladder network for semi-supervised learning on permutation invariant MNIST dataset, and our method outperforms the state-of-the-art methods: we obtain test errors as 2.52%, 1.06%, and 0.91% with only 20, 50, and 100 labeled samples, respectively. |
Tasks | |
Published | 2017-10-06 |
URL | http://arxiv.org/abs/1710.02338v1 |
http://arxiv.org/pdf/1710.02338v1.pdf | |
PWC | https://paperswithcode.com/paper/projection-based-weight-normalization-for |
Repo | https://github.com/huangleiBuaa/NormProjection |
Framework | torch |
Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages
Title | Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages |
Authors | Ehsaneddin Asgari, Hinrich Schütze |
Abstract | We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting - to the best of our knowledge - the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation. |
Tasks | |
Published | 2017-04-28 |
URL | http://arxiv.org/abs/1704.08914v2 |
http://arxiv.org/pdf/1704.08914v2.pdf | |
PWC | https://paperswithcode.com/paper/past-present-future-a-computational |
Repo | https://github.com/pywirrarika/naki |
Framework | none |