October 21, 2019

2980 words 14 mins read

Paper Group AWR 68

Paper Group AWR 68

Weakly-supervised Visual Instrument-playing Action Detection in Videos. Variational Autoencoders for Collaborative Filtering. DeepEfficiency - optimal efficiency inversion in higher dimensions at the LHC. Low-rank semidefinite programming for the MAX2SAT problem. Inference Suboptimality in Variational Autoencoders. Weakly Supervised Deep Image Hash …

Weakly-supervised Visual Instrument-playing Action Detection in Videos

Title Weakly-supervised Visual Instrument-playing Action Detection in Videos
Authors Jen-Yu Liu, Yi-Hsuan Yang, Shyh-Kang Jeng
Abstract Instrument playing is among the most common scenes in music-related videos, which represent nowadays one of the largest sources of online videos. In order to understand the instrument-playing scenes in the videos, it is important to know what instruments are played, when they are played, and where the playing actions occur in the scene. While audio-based recognition of instruments has been widely studied, the visual aspect of the music instrument playing remains largely unaddressed in the literature. One of the main obstacles is the difficulty in collecting annotated data of the action locations for training-based methods. To address this issue, we propose a weakly-supervised framework to find when and where the instruments are played in the videos. We propose to use two auxiliary models, a sound model and an object model, to provide supervisions for training the instrument-playing action model. The sound model provides temporal supervisions, while the object model provides spatial supervisions. They together can simultaneously provide temporal and spatial supervisions. The resulted model only needs to analyze the visual part of a music video to deduce which, when and where instruments are played. We found that the proposed method significantly improves the localization accuracy. We evaluate the result of the proposed method temporally and spatially on a small dataset (totally 5,400 frames) that we manually annotated.
Tasks Action Detection
Published 2018-05-05
URL http://arxiv.org/abs/1805.02031v1
PDF http://arxiv.org/pdf/1805.02031v1.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-visual-instrument-playing
Repo https://github.com/ciaua/InstrumentPlayingDetection
Framework pytorch

Variational Autoencoders for Collaborative Filtering

Title Variational Autoencoders for Collaborative Filtering
Authors Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, Tony Jebara
Abstract We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.
Tasks Bayesian Inference, Language Modelling, Recommendation Systems
Published 2018-02-16
URL http://arxiv.org/abs/1802.05814v1
PDF http://arxiv.org/pdf/1802.05814v1.pdf
PWC https://paperswithcode.com/paper/variational-autoencoders-for-collaborative
Repo https://github.com/jaywonchung/BERT4Rec-VAE-Pytorch
Framework pytorch

DeepEfficiency - optimal efficiency inversion in higher dimensions at the LHC

Title DeepEfficiency - optimal efficiency inversion in higher dimensions at the LHC
Authors Mikael Mieskolainen
Abstract We introduce a new high dimensional algorithm for efficiency corrected, maximally Monte Carlo event generator independent fiducial measurements at the LHC and beyond. The approach is driven probabilistically using a Deep Neural Network on an event-by-event basis, trained using detector simulation and even only pure phase space distributed events. This approach gives also a glimpse into the future of high energy physics, where experiments publish new type of measurements in a radically multidimensional way.
Tasks
Published 2018-09-17
URL http://arxiv.org/abs/1809.06101v1
PDF http://arxiv.org/pdf/1809.06101v1.pdf
PWC https://paperswithcode.com/paper/deepefficiency-optimal-efficiency-inversion
Repo https://github.com/mieskolainen/DeepEfficiency
Framework tf

Low-rank semidefinite programming for the MAX2SAT problem

Title Low-rank semidefinite programming for the MAX2SAT problem
Authors Po-Wei Wang, J. Zico Kolter
Abstract This paper proposes a new algorithm for solving MAX2SAT problems based on combining search methods with semidefinite programming approaches. Semidefinite programming techniques are well-known as a theoretical tool for approximating maximum satisfiability problems, but their application has traditionally been very limited by their speed and randomized nature. Our approach overcomes this difficult by using a recent approach to low-rank semidefinite programming, specialized to work in an incremental fashion suitable for use in an exact search algorithm. The method can be used both within complete or incomplete solver, and we demonstrate on a variety of problems from recent competitions. Our experiments show that the approach is faster (sometimes by orders of magnitude) than existing state-of-the-art complete and incomplete solvers, representing a substantial advance in search methods specialized for MAX2SAT problems.
Tasks
Published 2018-12-15
URL http://arxiv.org/abs/1812.06362v1
PDF http://arxiv.org/pdf/1812.06362v1.pdf
PWC https://paperswithcode.com/paper/low-rank-semidefinite-programming-for-the
Repo https://github.com/locuslab/mixsat
Framework none

Inference Suboptimality in Variational Autoencoders

Title Inference Suboptimality in Variational Autoencoders
Authors Chris Cremer, Xuechen Li, David Duvenaud
Abstract Amortized inference allows latent-variable models trained via variational learning to scale to large datasets. The quality of approximate inference is determined by two factors: a) the capacity of the variational distribution to match the true posterior and b) the ability of the recognition network to produce good variational parameters for each datapoint. We examine approximate inference in variational autoencoders in terms of these factors. We find that divergence from the true posterior is often due to imperfect recognition networks, rather than the limited complexity of the approximating distribution. We show that this is due partly to the generator learning to accommodate the choice of approximation. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.
Tasks Latent Variable Models
Published 2018-01-10
URL http://arxiv.org/abs/1801.03558v3
PDF http://arxiv.org/pdf/1801.03558v3.pdf
PWC https://paperswithcode.com/paper/inference-suboptimality-in-variational
Repo https://github.com/chriscremer/Inference-Suboptimality
Framework pytorch

Weakly Supervised Deep Image Hashing through Tag Embeddings

Title Weakly Supervised Deep Image Hashing through Tag Embeddings
Authors Vijetha Gattupalli, Yaoxin Zhuo, Baoxin Li
Abstract Many approaches to semantic image hashing have been formulated as supervised learning problems that utilize images and label information to learn the binary hash codes. However, large-scale labeled image data is expensive to obtain, thus imposing a restriction on the usage of such algorithms. On the other hand, unlabelled image data is abundant due to the existence of many Web image repositories. Such Web images may often come with images tags that contain useful information, although raw tags, in general, do not readily lead to semantic labels. Motivated by this scenario, we formulate the problem of semantic image hashing as a weakly-supervised learning problem. We utilize the information contained in the user-generated tags associated with the images to learn the hash codes. More specifically, we extract the word2vec semantic embeddings of the tags and use the information contained in them for constraining the learning. Accordingly, we name our model Weakly Supervised Deep Hashing using Tag Embeddings (WDHT). WDHT is tested for the task of semantic image retrieval and is compared against several state-of-art models. Results show that our approach sets a new state-of-art in the area of weekly supervised image hashing.
Tasks Image Retrieval
Published 2018-06-15
URL http://arxiv.org/abs/1806.05804v3
PDF http://arxiv.org/pdf/1806.05804v3.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-deep-image-hashing-through
Repo https://github.com/Vijetha1/WDHT
Framework tf

Change-Point Detection on Hierarchical Circadian Models

Title Change-Point Detection on Hierarchical Circadian Models
Authors Pablo Moreno-Muñoz, David Ramírez, Antonio Artés-Rodríguez
Abstract This paper addresses the problem of change-point detection on sequences of high-dimensional and heterogeneous observations, which also possess a periodic temporal structure. Due to the dimensionality problem, when the time between change-points is on the order of the dimension of the model parameters, drifts in the underlying distribution can be misidentified as changes. To overcome this limitation, we assume that the observations lie in a lower-dimensional manifold that admits a latent variable representation. In particular, we propose a hierarchical model that is computationally feasible, widely applicable to heterogeneous data and robust to missing instances. Additionally, the observations’ periodic dependencies are captured by non-stationary periodic covariance functions. The proposed technique is particularly fitted to (and motivated by) the problem of detecting changes in human behavior using smartphones and its application to relapse detection in psychiatric patients. Finally, we validate the technique on synthetic examples and we demonstrate its utility in the detection of behavioral changes using real data acquired by smartphones.
Tasks Change Point Detection
Published 2018-09-11
URL http://arxiv.org/abs/1809.04197v2
PDF http://arxiv.org/pdf/1809.04197v2.pdf
PWC https://paperswithcode.com/paper/change-point-detection-on-hierarchical
Repo https://github.com/pmorenoz/HierCPD
Framework none

ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector

Title ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector
Authors Shang-Tse Chen, Cory Cornelius, Jason Martin, Duen Horng Chau
Abstract Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a Deep Neural Network (DNN) image classifier, as demonstrated in prior work. In this work, we propose ShapeShifter, an attack that tackles the more challenging problem of crafting physical adversarial perturbations to fool image-based object detectors like Faster R-CNN. Attacking an object detector is more difficult than attacking an image classifier, as it needs to mislead the classification results in multiple bounding boxes with different scales. Extending the digital attack to the physical world adds another layer of difficulty, because it requires the perturbation to be robust enough to survive real-world distortions due to different viewing distances and angles, lighting conditions, and camera limitations. We show that the Expectation over Transformation technique, which was originally proposed to enhance the robustness of adversarial perturbations in image classification, can be successfully adapted to the object detection setting. ShapeShifter can generate adversarially perturbed stop signs that are consistently mis-detected by Faster R-CNN as other objects, posing a potential threat to autonomous vehicles and other safety-critical computer vision systems.
Tasks Adversarial Attack, Autonomous Vehicles, Image Classification, Object Detection
Published 2018-04-16
URL http://arxiv.org/abs/1804.05810v3
PDF http://arxiv.org/pdf/1804.05810v3.pdf
PWC https://paperswithcode.com/paper/shapeshifter-robust-physical-adversarial
Repo https://github.com/shangtse/robust-physical-attack
Framework tf

MegaDepth: Learning Single-View Depth Prediction from Internet Photos

Title MegaDepth: Learning Single-View Depth Prediction from Internet Photos
Authors Zhengqi Li, Noah Snavely
Abstract Single-view depth prediction is a fundamental problem in computer vision. Recently, deep learning methods have led to significant progress, but such methods are limited by the available training data. Current datasets based on 3D sensors have key limitations, including indoor-only images (NYU), small numbers of training examples (Make3D), and sparse sampling (KITTI). We propose to use multi-view Internet photo collections, a virtually unlimited data source, to generate training data via modern structure-from-motion and multi-view stereo (MVS) methods, and present a large depth dataset called MegaDepth based on this idea. Data derived from MVS comes with its own challenges, including noise and unreconstructable objects. We address these challenges with new data cleaning methods, as well as automatically augmenting our data with ordinal depth relations generated using semantic segmentation. We validate the use of large amounts of Internet data by showing that models trained on MegaDepth exhibit strong generalization-not only to novel scenes, but also to other diverse datasets including Make3D, KITTI, and DIW, even when no images from those datasets are seen during training.
Tasks Depth Estimation, Semantic Segmentation
Published 2018-04-02
URL http://arxiv.org/abs/1804.00607v4
PDF http://arxiv.org/pdf/1804.00607v4.pdf
PWC https://paperswithcode.com/paper/megadepth-learning-single-view-depth
Repo https://github.com/zhengqili/MegaDepth
Framework pytorch

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

Title Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Authors Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li
Abstract Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
Tasks Question Answering, Visual Question Answering
Published 2018-12-13
URL https://arxiv.org/abs/1812.05252v4
PDF https://arxiv.org/pdf/1812.05252v4.pdf
PWC https://paperswithcode.com/paper/dynamic-fusion-with-intra-and-inter-modality
Repo https://github.com/bupt-cist/DFAF-for-VQA.pytorch
Framework pytorch

DeepDiff: Deep-learning for predicting Differential gene expression from histone modifications

Title DeepDiff: Deep-learning for predicting Differential gene expression from histone modifications
Authors Arshdeep Sekhon, Ritambhara Singh, Yanjun Qi
Abstract Computational methods that predict differential gene expression from histone modification signals are highly desirable for understanding how histone modifications control the functional heterogeneity of cells through influencing differential gene regulation. Recent studies either failed to capture combinatorial effects on differential prediction or primarily only focused on cell type-specific analysis. In this paper, we develop a novel attention-based deep learning architecture, DeepDiff, that provides a unified and end-to-end solution to model and to interpret how dependencies among histone modifications control the differential patterns of gene regulation. DeepDiff uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the spatial structure of input signals and to model how various histone modifications cooperate automatically. We introduce and train two levels of attention jointly with the target prediction, enabling DeepDiff to attend differentially to relevant modifications and to locate important genome positions for each modification. Additionally, DeepDiff introduces a novel deep-learning based multi-task formulation to use the cell-type-specific gene expression predictions as auxiliary tasks, encouraging richer feature embeddings in our primary task of differential expression prediction. Using data from Roadmap Epigenomics Project (REMC) for ten different pairs of cell types, we show that DeepDiff significantly outperforms the state-of-the-art baselines for differential gene expression prediction. The learned attention weights are validated by observations from previous studies about how epigenetic mechanisms connect to differential gene expression. Codes and results are available at \url{deepchrome.org}
Tasks
Published 2018-07-10
URL http://arxiv.org/abs/1807.03878v1
PDF http://arxiv.org/pdf/1807.03878v1.pdf
PWC https://paperswithcode.com/paper/deepdiff-deep-learning-for-predicting
Repo https://github.com/QData/DeepDiffChrome
Framework pytorch

Exploring Recombination for Efficient Decoding of Neural Machine Translation

Title Exploring Recombination for Efficient Decoding of Neural Machine Translation
Authors Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao
Abstract In Neural Machine Translation (NMT), the decoder can capture the features of the entire prediction history with neural connections and representations. This means that partial hypotheses with different prefixes will be regarded differently no matter how similar they are. However, this might be inefficient since some partial hypotheses can contain only local differences that will not influence future predictions. In this work, we introduce recombination in NMT decoding based on the concept of the “equivalence” of partial hypotheses. Heuristically, we use a simple $n$-gram suffix based equivalence function and adapt it into beam search decoding. Through experiments on large-scale Chinese-to-English and English-to-Germen translation tasks, we show that the proposed method can obtain similar translation quality with a smaller beam size, making NMT decoding more efficient.
Tasks Machine Translation
Published 2018-08-25
URL http://arxiv.org/abs/1808.08482v2
PDF http://arxiv.org/pdf/1808.08482v2.pdf
PWC https://paperswithcode.com/paper/exploring-recombination-for-efficient
Repo https://github.com/zzsfornlp/znmt-merge
Framework none

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Title TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild
Authors Matthias Müller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, Bernard Ghanem
Abstract Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.
Tasks Object Detection, Object Tracking
Published 2018-03-28
URL http://arxiv.org/abs/1803.10794v1
PDF http://arxiv.org/pdf/1803.10794v1.pdf
PWC https://paperswithcode.com/paper/trackingnet-a-large-scale-dataset-and
Repo https://github.com/SilvioGiancola/TrackingNet-devkit
Framework none

Classification of Point Cloud Scenes with Multiscale Voxel Deep Network

Title Classification of Point Cloud Scenes with Multiscale Voxel Deep Network
Authors Xavier Roynard, Jean-Emmanuel Deschaud, François Goulette
Abstract In this article we describe a new convolutional neural network (CNN) to classify 3D point clouds of urban or indoor scenes. Solutions are given to the problems encountered working on scene point clouds, and a network is described that allows for point classification using only the position of points in a multi-scale neighborhood. On the reduced-8 Semantic3D benchmark [Hackel et al., 2017], this network, ranked second, beats the state of the art of point classification methods (those not using a regularization step).
Tasks Semantic Segmentation
Published 2018-04-10
URL http://arxiv.org/abs/1804.03583v1
PDF http://arxiv.org/pdf/1804.03583v1.pdf
PWC https://paperswithcode.com/paper/classification-of-point-cloud-scenes-with
Repo https://github.com/xroynard/ms_deepvoxscene
Framework pytorch

Efficient GAN-Based Anomaly Detection

Title Efficient GAN-Based Anomaly Detection
Authors Houssam Zenati, Chuan Sheng Foo, Bruno Lecouat, Gaurav Manek, Vijay Ramaseshan Chandrasekhar
Abstract Generative adversarial networks (GANs) are able to model the complex highdimensional distributions of real-world data, which suggests they could be effective for anomaly detection. However, few works have explored the use of GANs for the anomaly detection task. We leverage recently developed GAN models for anomaly detection, and achieve state-of-the-art performance on image and network intrusion datasets, while being several hundred-fold faster at test time than the only published GAN-based method.
Tasks Anomaly Detection
Published 2018-02-17
URL http://arxiv.org/abs/1802.06222v2
PDF http://arxiv.org/pdf/1802.06222v2.pdf
PWC https://paperswithcode.com/paper/efficient-gan-based-anomaly-detection
Repo https://github.com/houssamzenati/Efficient-GAN-Anomaly-Detection
Framework tf
comments powered by Disqus