October 21, 2019

2980 words 14 mins read

Paper Group AWR 68

Weakly-supervised Visual Instrument-playing Action Detection in Videos. Variational Autoencoders for Collaborative Filtering. DeepEfficiency - optimal efficiency inversion in higher dimensions at the LHC. Low-rank semidefinite programming for the MAX2SAT problem. Inference Suboptimality in Variational Autoencoders. Weakly Supervised Deep Image Hash …

Weakly-supervised Visual Instrument-playing Action Detection in Videos


Title	Weakly-supervised Visual Instrument-playing Action Detection in Videos
Authors	Jen-Yu Liu, Yi-Hsuan Yang, Shyh-Kang Jeng
Abstract	Instrument playing is among the most common scenes in music-related videos, which represent nowadays one of the largest sources of online videos. In order to understand the instrument-playing scenes in the videos, it is important to know what instruments are played, when they are played, and where the playing actions occur in the scene. While audio-based recognition of instruments has been widely studied, the visual aspect of the music instrument playing remains largely unaddressed in the literature. One of the main obstacles is the difficulty in collecting annotated data of the action locations for training-based methods. To address this issue, we propose a weakly-supervised framework to find when and where the instruments are played in the videos. We propose to use two auxiliary models, a sound model and an object model, to provide supervisions for training the instrument-playing action model. The sound model provides temporal supervisions, while the object model provides spatial supervisions. They together can simultaneously provide temporal and spatial supervisions. The resulted model only needs to analyze the visual part of a music video to deduce which, when and where instruments are played. We found that the proposed method significantly improves the localization accuracy. We evaluate the result of the proposed method temporally and spatially on a small dataset (totally 5,400 frames) that we manually annotated.
Tasks	Action Detection
Published	2018-05-05
URL	http://arxiv.org/abs/1805.02031v1
PDF	http://arxiv.org/pdf/1805.02031v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-visual-instrument-playing
Repo	https://github.com/ciaua/InstrumentPlayingDetection
Framework	pytorch

Variational Autoencoders for Collaborative Filtering


Title	Variational Autoencoders for Collaborative Filtering
Authors	Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, Tony Jebara
Abstract	We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.
Tasks	Bayesian Inference, Language Modelling, Recommendation Systems
Published	2018-02-16
URL	http://arxiv.org/abs/1802.05814v1
PDF	http://arxiv.org/pdf/1802.05814v1.pdf
PWC	https://paperswithcode.com/paper/variational-autoencoders-for-collaborative
Repo	https://github.com/jaywonchung/BERT4Rec-VAE-Pytorch
Framework	pytorch

DeepEfficiency - optimal efficiency inversion in higher dimensions at the LHC


Title	DeepEfficiency - optimal efficiency inversion in higher dimensions at the LHC
Authors	Mikael Mieskolainen
Abstract	We introduce a new high dimensional algorithm for efficiency corrected, maximally Monte Carlo event generator independent fiducial measurements at the LHC and beyond. The approach is driven probabilistically using a Deep Neural Network on an event-by-event basis, trained using detector simulation and even only pure phase space distributed events. This approach gives also a glimpse into the future of high energy physics, where experiments publish new type of measurements in a radically multidimensional way.
Tasks
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06101v1
PDF	http://arxiv.org/pdf/1809.06101v1.pdf
PWC	https://paperswithcode.com/paper/deepefficiency-optimal-efficiency-inversion
Repo	https://github.com/mieskolainen/DeepEfficiency
Framework	tf

Low-rank semidefinite programming for the MAX2SAT problem


Title	Low-rank semidefinite programming for the MAX2SAT problem
Authors	Po-Wei Wang, J. Zico Kolter
Abstract	This paper proposes a new algorithm for solving MAX2SAT problems based on combining search methods with semidefinite programming approaches. Semidefinite programming techniques are well-known as a theoretical tool for approximating maximum satisfiability problems, but their application has traditionally been very limited by their speed and randomized nature. Our approach overcomes this difficult by using a recent approach to low-rank semidefinite programming, specialized to work in an incremental fashion suitable for use in an exact search algorithm. The method can be used both within complete or incomplete solver, and we demonstrate on a variety of problems from recent competitions. Our experiments show that the approach is faster (sometimes by orders of magnitude) than existing state-of-the-art complete and incomplete solvers, representing a substantial advance in search methods specialized for MAX2SAT problems.
Tasks
Published	2018-12-15
URL	http://arxiv.org/abs/1812.06362v1
PDF	http://arxiv.org/pdf/1812.06362v1.pdf
PWC	https://paperswithcode.com/paper/low-rank-semidefinite-programming-for-the
Repo	https://github.com/locuslab/mixsat
Framework	none

Inference Suboptimality in Variational Autoencoders


Title	Inference Suboptimality in Variational Autoencoders
Authors	Chris Cremer, Xuechen Li, David Duvenaud
Abstract	Amortized inference allows latent-variable models trained via variational learning to scale to large datasets. The quality of approximate inference is determined by two factors: a) the capacity of the variational distribution to match the true posterior and b) the ability of the recognition network to produce good variational parameters for each datapoint. We examine approximate inference in variational autoencoders in terms of these factors. We find that divergence from the true posterior is often due to imperfect recognition networks, rather than the limited complexity of the approximating distribution. We show that this is due partly to the generator learning to accommodate the choice of approximation. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.
Tasks	Latent Variable Models
Published	2018-01-10
URL	http://arxiv.org/abs/1801.03558v3
PDF	http://arxiv.org/pdf/1801.03558v3.pdf
PWC	https://paperswithcode.com/paper/inference-suboptimality-in-variational
Repo	https://github.com/chriscremer/Inference-Suboptimality
Framework	pytorch

Weakly Supervised Deep Image Hashing through Tag Embeddings


Title	Weakly Supervised Deep Image Hashing through Tag Embeddings
Authors	Vijetha Gattupalli, Yaoxin Zhuo, Baoxin Li
Abstract	Many approaches to semantic image hashing have been formulated as supervised learning problems that utilize images and label information to learn the binary hash codes. However, large-scale labeled image data is expensive to obtain, thus imposing a restriction on the usage of such algorithms. On the other hand, unlabelled image data is abundant due to the existence of many Web image repositories. Such Web images may often come with images tags that contain useful information, although raw tags, in general, do not readily lead to semantic labels. Motivated by this scenario, we formulate the problem of semantic image hashing as a weakly-supervised learning problem. We utilize the information contained in the user-generated tags associated with the images to learn the hash codes. More specifically, we extract the word2vec semantic embeddings of the tags and use the information contained in them for constraining the learning. Accordingly, we name our model Weakly Supervised Deep Hashing using Tag Embeddings (WDHT). WDHT is tested for the task of semantic image retrieval and is compared against several state-of-art models. Results show that our approach sets a new state-of-art in the area of weekly supervised image hashing.
Tasks	Image Retrieval
Published	2018-06-15
URL	http://arxiv.org/abs/1806.05804v3
PDF	http://arxiv.org/pdf/1806.05804v3.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-deep-image-hashing-through
Repo	https://github.com/Vijetha1/WDHT
Framework	tf

Change-Point Detection on Hierarchical Circadian Models


Title	Change-Point Detection on Hierarchical Circadian Models
Authors	Pablo Moreno-Muñoz, David Ramírez, Antonio Artés-Rodríguez
Abstract	This paper addresses the problem of change-point detection on sequences of high-dimensional and heterogeneous observations, which also possess a periodic temporal structure. Due to the dimensionality problem, when the time between change-points is on the order of the dimension of the model parameters, drifts in the underlying distribution can be misidentified as changes. To overcome this limitation, we assume that the observations lie in a lower-dimensional manifold that admits a latent variable representation. In particular, we propose a hierarchical model that is computationally feasible, widely applicable to heterogeneous data and robust to missing instances. Additionally, the observations’ periodic dependencies are captured by non-stationary periodic covariance functions. The proposed technique is particularly fitted to (and motivated by) the problem of detecting changes in human behavior using smartphones and its application to relapse detection in psychiatric patients. Finally, we validate the technique on synthetic examples and we demonstrate its utility in the detection of behavioral changes using real data acquired by smartphones.
Tasks	Change Point Detection
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04197v2
PDF	http://arxiv.org/pdf/1809.04197v2.pdf
PWC	https://paperswithcode.com/paper/change-point-detection-on-hierarchical
Repo	https://github.com/pmorenoz/HierCPD
Framework	none

ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector


Title	ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector
Authors	Shang-Tse Chen, Cory Cornelius, Jason Martin, Duen Horng Chau
Abstract	Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a Deep Neural Network (DNN) image classifier, as demonstrated in prior work. In this work, we propose ShapeShifter, an attack that tackles the more challenging problem of crafting physical adversarial perturbations to fool image-based object detectors like Faster R-CNN. Attacking an object detector is more difficult than attacking an image classifier, as it needs to mislead the classification results in multiple bounding boxes with different scales. Extending the digital attack to the physical world adds another layer of difficulty, because it requires the perturbation to be robust enough to survive real-world distortions due to different viewing distances and angles, lighting conditions, and camera limitations. We show that the Expectation over Transformation technique, which was originally proposed to enhance the robustness of adversarial perturbations in image classification, can be successfully adapted to the object detection setting. ShapeShifter can generate adversarially perturbed stop signs that are consistently mis-detected by Faster R-CNN as other objects, posing a potential threat to autonomous vehicles and other safety-critical computer vision systems.
Tasks	Adversarial Attack, Autonomous Vehicles, Image Classification, Object Detection
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05810v3
PDF	http://arxiv.org/pdf/1804.05810v3.pdf
PWC	https://paperswithcode.com/paper/shapeshifter-robust-physical-adversarial
Repo	https://github.com/shangtse/robust-physical-attack
Framework	tf

MegaDepth: Learning Single-View Depth Prediction from Internet Photos


Title	MegaDepth: Learning Single-View Depth Prediction from Internet Photos
Authors	Zhengqi Li, Noah Snavely
Abstract	Single-view depth prediction is a fundamental problem in computer vision. Recently, deep learning methods have led to significant progress, but such methods are limited by the available training data. Current datasets based on 3D sensors have key limitations, including indoor-only images (NYU), small numbers of training examples (Make3D), and sparse sampling (KITTI). We propose to use multi-view Internet photo collections, a virtually unlimited data source, to generate training data via modern structure-from-motion and multi-view stereo (MVS) methods, and present a large depth dataset called MegaDepth based on this idea. Data derived from MVS comes with its own challenges, including noise and unreconstructable objects. We address these challenges with new data cleaning methods, as well as automatically augmenting our data with ordinal depth relations generated using semantic segmentation. We validate the use of large amounts of Internet data by showing that models trained on MegaDepth exhibit strong generalization-not only to novel scenes, but also to other diverse datasets including Make3D, KITTI, and DIW, even when no images from those datasets are seen during training.
Tasks	Depth Estimation, Semantic Segmentation
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00607v4
PDF	http://arxiv.org/pdf/1804.00607v4.pdf
PWC	https://paperswithcode.com/paper/megadepth-learning-single-view-depth
Repo	https://github.com/zhengqili/MegaDepth
Framework	pytorch

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering


Title	Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Authors	Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li
Abstract	Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fusing multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that the proposed dynamic intra-modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the target modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
Tasks	Question Answering, Visual Question Answering
Published	2018-12-13
URL	https://arxiv.org/abs/1812.05252v4
PDF	https://arxiv.org/pdf/1812.05252v4.pdf
PWC	https://paperswithcode.com/paper/dynamic-fusion-with-intra-and-inter-modality
Repo	https://github.com/bupt-cist/DFAF-for-VQA.pytorch
Framework	pytorch

DeepDiff: Deep-learning for predicting Differential gene expression from histone modifications


Title	DeepDiff: Deep-learning for predicting Differential gene expression from histone modifications
Authors	Arshdeep Sekhon, Ritambhara Singh, Yanjun Qi
Abstract	Computational methods that predict differential gene expression from histone modification signals are highly desirable for understanding how histone modifications control the functional heterogeneity of cells through influencing differential gene regulation. Recent studies either failed to capture combinatorial effects on differential prediction or primarily only focused on cell type-specific analysis. In this paper, we develop a novel attention-based deep learning architecture, DeepDiff, that provides a unified and end-to-end solution to model and to interpret how dependencies among histone modifications control the differential patterns of gene regulation. DeepDiff uses a hierarchy of multiple Long short-term memory (LSTM) modules to encode the spatial structure of input signals and to model how various histone modifications cooperate automatically. We introduce and train two levels of attention jointly with the target prediction, enabling DeepDiff to attend differentially to relevant modifications and to locate important genome positions for each modification. Additionally, DeepDiff introduces a novel deep-learning based multi-task formulation to use the cell-type-specific gene expression predictions as auxiliary tasks, encouraging richer feature embeddings in our primary task of differential expression prediction. Using data from Roadmap Epigenomics Project (REMC) for ten different pairs of cell types, we show that DeepDiff significantly outperforms the state-of-the-art baselines for differential gene expression prediction. The learned attention weights are validated by observations from previous studies about how epigenetic mechanisms connect to differential gene expression. Codes and results are available at \url{deepchrome.org}
Tasks
Published	2018-07-10
URL	http://arxiv.org/abs/1807.03878v1
PDF	http://arxiv.org/pdf/1807.03878v1.pdf
PWC	https://paperswithcode.com/paper/deepdiff-deep-learning-for-predicting
Repo	https://github.com/QData/DeepDiffChrome
Framework	pytorch

Exploring Recombination for Efficient Decoding of Neural Machine Translation


Title	Exploring Recombination for Efficient Decoding of Neural Machine Translation
Authors	Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao
Abstract	In Neural Machine Translation (NMT), the decoder can capture the features of the entire prediction history with neural connections and representations. This means that partial hypotheses with different prefixes will be regarded differently no matter how similar they are. However, this might be inefficient since some partial hypotheses can contain only local differences that will not influence future predictions. In this work, we introduce recombination in NMT decoding based on the concept of the “equivalence” of partial hypotheses. Heuristically, we use a simple $n$-gram suffix based equivalence function and adapt it into beam search decoding. Through experiments on large-scale Chinese-to-English and English-to-Germen translation tasks, we show that the proposed method can obtain similar translation quality with a smaller beam size, making NMT decoding more efficient.
Tasks	Machine Translation
Published	2018-08-25
URL	http://arxiv.org/abs/1808.08482v2
PDF	http://arxiv.org/pdf/1808.08482v2.pdf
PWC	https://paperswithcode.com/paper/exploring-recombination-for-efficient
Repo	https://github.com/zzsfornlp/znmt-merge
Framework	none

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild


Title	TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild
Authors	Matthias Müller, Adel Bibi, Silvio Giancola, Salman Al-Subaihi, Bernard Ghanem
Abstract	Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.
Tasks	Object Detection, Object Tracking
Published	2018-03-28
URL	http://arxiv.org/abs/1803.10794v1
PDF	http://arxiv.org/pdf/1803.10794v1.pdf
PWC	https://paperswithcode.com/paper/trackingnet-a-large-scale-dataset-and
Repo	https://github.com/SilvioGiancola/TrackingNet-devkit
Framework	none

Classification of Point Cloud Scenes with Multiscale Voxel Deep Network


Title	Classification of Point Cloud Scenes with Multiscale Voxel Deep Network
Authors	Xavier Roynard, Jean-Emmanuel Deschaud, François Goulette
Abstract	In this article we describe a new convolutional neural network (CNN) to classify 3D point clouds of urban or indoor scenes. Solutions are given to the problems encountered working on scene point clouds, and a network is described that allows for point classification using only the position of points in a multi-scale neighborhood. On the reduced-8 Semantic3D benchmark [Hackel et al., 2017], this network, ranked second, beats the state of the art of point classification methods (those not using a regularization step).
Tasks	Semantic Segmentation
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03583v1
PDF	http://arxiv.org/pdf/1804.03583v1.pdf
PWC	https://paperswithcode.com/paper/classification-of-point-cloud-scenes-with
Repo	https://github.com/xroynard/ms_deepvoxscene
Framework	pytorch

Efficient GAN-Based Anomaly Detection


Title	Efficient GAN-Based Anomaly Detection
Authors	Houssam Zenati, Chuan Sheng Foo, Bruno Lecouat, Gaurav Manek, Vijay Ramaseshan Chandrasekhar
Abstract	Generative adversarial networks (GANs) are able to model the complex highdimensional distributions of real-world data, which suggests they could be effective for anomaly detection. However, few works have explored the use of GANs for the anomaly detection task. We leverage recently developed GAN models for anomaly detection, and achieve state-of-the-art performance on image and network intrusion datasets, while being several hundred-fold faster at test time than the only published GAN-based method.
Tasks	Anomaly Detection
Published	2018-02-17
URL	http://arxiv.org/abs/1802.06222v2
PDF	http://arxiv.org/pdf/1802.06222v2.pdf
PWC	https://paperswithcode.com/paper/efficient-gan-based-anomaly-detection
Repo	https://github.com/houssamzenati/Efficient-GAN-Anomaly-Detection
Framework	tf