July 28, 2019

2890 words 14 mins read

Paper Group ANR 376

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss. Understanding and Predicting The Attractiveness of Human Action Shot. Dimensionality Reduction on Grassmannian via Riemannian Optimization: A Generalized Perspective. Automatic Disambiguation of French Discourse Connectives. UTS submission to Google YouTube-8M Challenge …

UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss


Title	UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss
Authors	Simon Meister, Junhwa Hur, Stefan Roth
Abstract	In the era of end-to-end deep learning, many advances in computer vision are driven by large amounts of labeled data. In the optical flow setting, however, obtaining dense per-pixel ground truth for real scenes is difficult and thus such data is rare. Therefore, recent end-to-end convolutional networks for optical flow rely on synthetic datasets for supervision, but the domain mismatch between training and test scenarios continues to be a challenge. Inspired by classical energy-based optical flow methods, we design an unsupervised loss based on occlusion-aware bidirectional flow estimation and the robust census transform to circumvent the need for ground truth flow. On the KITTI benchmarks, our unsupervised approach outperforms previous unsupervised deep networks by a large margin, and is even more accurate than similar supervised methods trained on synthetic datasets alone. By optionally fine-tuning on the KITTI training data, our method achieves competitive optical flow accuracy on the KITTI 2012 and 2015 benchmarks, thus in addition enabling generic pre-training of supervised networks for datasets with limited amounts of ground truth.
Tasks	Optical Flow Estimation
Published	2017-11-21
URL	http://arxiv.org/abs/1711.07837v1
PDF	http://arxiv.org/pdf/1711.07837v1.pdf
PWC	https://paperswithcode.com/paper/unflow-unsupervised-learning-of-optical-flow
Repo
Framework

Understanding and Predicting The Attractiveness of Human Action Shot


Title	Understanding and Predicting The Attractiveness of Human Action Shot
Authors	Bin Dai, Baoyuan Wang, Gang Hua
Abstract	Selecting attractive photos from a human action shot sequence is quite challenging, because of the subjective nature of the “attractiveness”, which is mainly a combined factor of human pose in action and the background. Prior works have actively studied high-level image attributes including interestingness, memorability, popularity, and aesthetics. However, none of them has ever studied the “attractiveness” of human action shot. In this paper, we present the first study of the “attractiveness” of human action shots by taking a systematic data-driven approach. Specifically, we create a new action-shot dataset composed of about 8000 high quality action-shot photos. We further conduct rich crowd-sourced human judge studies on Amazon Mechanical Turk(AMT) in terms of global attractiveness of a single photo, and relative attractiveness of a pair of photos. A deep Siamese network with a novel hybrid distribution matching loss was further proposed to fully exploit both types of ratings. Extensive experiments reveal that (1) the property of action shot attractiveness is subjective but predicable (2) our proposed method is both efficient and effective for predicting the attractive human action shots.
Tasks
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00677v1
PDF	http://arxiv.org/pdf/1711.00677v1.pdf
PWC	https://paperswithcode.com/paper/understanding-and-predicting-the
Repo
Framework

Dimensionality Reduction on Grassmannian via Riemannian Optimization: A Generalized Perspective


Title	Dimensionality Reduction on Grassmannian via Riemannian Optimization: A Generalized Perspective
Authors	Tianci Liu, Zelin Shi, Yunpeng Liu
Abstract	This paper proposes a generalized framework with joint normalization which learns lower-dimensional subspaces with maximum discriminative power by making use of the Riemannian geometry. In particular, we model the similarity/dissimilarity between subspaces using various metrics defined on Grassmannian and formulate dimen-sionality reduction as a non-linear constraint optimization problem considering the orthogonalization. To obtain the linear mapping, we derive the components required to per-form Riemannian optimization (e.g., Riemannian conju-gate gradient) from the original Grassmannian through an orthonormal projection. We respect the Riemannian ge-ometry of the Grassmann manifold and search for this projection directly from one Grassmann manifold to an-other face-to-face without any additional transformations. In this natural geometry-aware way, any metric on the Grassmann manifold can be resided in our model theoreti-cally. We have combined five metrics with our model and the learning process can be treated as an unconstrained optimization problem on a Grassmann manifold. Exper-iments on several datasets demonstrate that our approach leads to a significant accuracy gain over state-of-the-art methods.
Tasks	Dimensionality Reduction
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06382v1
PDF	http://arxiv.org/pdf/1711.06382v1.pdf
PWC	https://paperswithcode.com/paper/dimensionality-reduction-on-grassmannian-via
Repo
Framework

Automatic Disambiguation of French Discourse Connectives


Title	Automatic Disambiguation of French Discourse Connectives
Authors	Majid Laali, Leila Kosseim
Abstract	Discourse connectives (e.g. however, because) are terms that can explicitly convey a discourse relation within a text. While discourse connectives have been shown to be an effective clue to automatically identify discourse relations, they are not always used to convey such relations, thus they should first be disambiguated between discourse-usage non-discourse-usage. In this paper, we investigate the applicability of features proposed for the disambiguation of English discourse connectives for French. Our results with the French Discourse Treebank (FDTB) show that syntactic and lexical features developed for English texts are as effective for French and allow the disambiguation of French discourse connectives with an accuracy of 94.2%.
Tasks
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05162v1
PDF	http://arxiv.org/pdf/1704.05162v1.pdf
PWC	https://paperswithcode.com/paper/automatic-disambiguation-of-french-discourse
Repo
Framework

UTS submission to Google YouTube-8M Challenge 2017


Title	UTS submission to Google YouTube-8M Challenge 2017
Authors	Linchao Zhu, Yanbin Liu, Yi Yang
Abstract	In this paper, we present our solution to Google YouTube-8M Video Classification Challenge 2017. We leveraged both video-level and frame-level features in the submission. For video-level classification, we simply used a 200-mixture Mixture of Experts (MoE) layer, which achieves GAP 0.802 on the validation set with a single model. For frame-level classification, we utilized several variants of recurrent neural networks, sequence aggregation with attention mechanism and 1D convolutional models. We achieved GAP 0.8408 on the private testing set with the ensemble model. The source code of our models can be found in \url{https://github.com/ffmpbgrnn/yt8m}.
Tasks	Video Classification
Published	2017-07-13
URL	http://arxiv.org/abs/1707.04143v1
PDF	http://arxiv.org/pdf/1707.04143v1.pdf
PWC	https://paperswithcode.com/paper/uts-submission-to-google-youtube-8m-challenge
Repo
Framework

CAR-Net: Clairvoyant Attentive Recurrent Network


Title	CAR-Net: Clairvoyant Attentive Recurrent Network
Authors	Amir Sadeghian, Ferdinand Legros, Maxime Voisin, Ricky Vesel, Alexandre Alahi, Silvio Savarese
Abstract	We present an interpretable framework for path prediction that leverages dependencies between agents’ behaviors and their spatial navigation environment. We exploit two sources of information: the past motion trajectory of the agent of interest and a wide top-view image of the navigation scene. We propose a Clairvoyant Attentive Recurrent Network (CAR-Net) that learns where to look in a large image of the scene when solving the path prediction task. Our method can attend to any area, or combination of areas, within the raw image (e.g., road intersections) when predicting the trajectory of the agent. This allows us to visualize fine-grained semantic elements of navigation scenes that influence the prediction of trajectories. To study the impact of space on agents’ trajectories, we build a new dataset made of top-view images of hundreds of scenes (Formula One racing tracks) where agents’ behaviors are heavily influenced by known areas in the images (e.g., upcoming turns). CAR-Net successfully attends to these salient regions. Additionally, CAR-Net reaches state-of-the-art accuracy on the standard trajectory forecasting benchmark, Stanford Drone Dataset (SDD). Finally, we show CAR-Net’s ability to generalize to unseen scenes.
Tasks
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10061v3
PDF	http://arxiv.org/pdf/1711.10061v3.pdf
PWC	https://paperswithcode.com/paper/car-net-clairvoyant-attentive-recurrent
Repo
Framework

A generalization of the Jensen divergence: The chord gap divergence


Title	A generalization of the Jensen divergence: The chord gap divergence
Authors	Frank Nielsen
Abstract	We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties. It follows a generalization of the celebrated statistical Bhattacharyya distance that is frequently met in applications. We report an iterative concave-convex procedure for computing centroids, and analyze the performance of the $k$-means++ clustering with respect to that new dissimilarity measure by introducing the Taylor-Lagrange remainder form of the skew Jensen divergences.
Tasks
Published	2017-09-29
URL	http://arxiv.org/abs/1709.10498v2
PDF	http://arxiv.org/pdf/1709.10498v2.pdf
PWC	https://paperswithcode.com/paper/a-generalization-of-the-jensen-divergence-the
Repo
Framework

Fast and Efficient Skin Detection for Facial Detection


Title	Fast and Efficient Skin Detection for Facial Detection
Authors	Mohammad Reza Mahmoodi
Abstract	In this paper, an efficient skin detection system is proposed. The algorithm is based on a very fast efficient pre-processing step utilizing the concept of ternary conversion in order to identify candidate windows and subsequently, a novel local two-stage diffusion method which has F-score accuracy of 0.5978 on SDD dataset. The pre-processing step has been proven to be useful to boost the speed of the system by eliminating 82% of an image in average. This is obtained by keeping the true positive rate above 98%. In addition, a novel segmentation algorithm is also designed to process candidate windows which is quantitatively and qualitatively proven to be very efficient in term of accuracy. The algorithm has been implemented in FPGA to obtain real-time processing speed. The system is designed fully pipeline and the inherent parallel structure of the algorithm is fully exploited to maximize the performance. The system is implemented on a Spartan-6 LXT45 Xilinx FPGA and it is capable of processing 98 frames of 640*480 24-bit color images per second.
Tasks
Published	2017-01-19
URL	http://arxiv.org/abs/1701.05595v1
PDF	http://arxiv.org/pdf/1701.05595v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-efficient-skin-detection-for-facial
Repo
Framework

Application of generative autoencoder in de novo molecular design


Title	Application of generative autoencoder in de novo molecular design
Authors	Thomas Blaschke, Marcus Olivecrona, Ola Engkvist, Jürgen Bajorath, Hongming Chen
Abstract	A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the training set were identified.
Tasks
Published	2017-11-21
URL	http://arxiv.org/abs/1711.07839v1
PDF	http://arxiv.org/pdf/1711.07839v1.pdf
PWC	https://paperswithcode.com/paper/application-of-generative-autoencoder-in-de
Repo
Framework

Data Fusion on Motion and Magnetic Sensors embedded on Mobile Devices for the Identification of Activities of Daily Living


Title	Data Fusion on Motion and Magnetic Sensors embedded on Mobile Devices for the Identification of Activities of Daily Living
Authors	Ivan Miguel Pires, Nuno M. Garcia, Nuno Pombo, Francisco Flórez-Revuelta, Susanna Spinsante
Abstract	Several types of sensors have been available in off-the-shelf mobile devices, including motion, magnetic, vision, acoustic, and location sensors. This paper focuses on the fusion of the data acquired from motion and magnetic sensors, i.e., accelerometer, gyroscope and magnetometer sensors, for the recognition of Activities of Daily Living (ADL) using pattern recognition techniques. The system developed in this study includes data acquisition, data processing, data fusion, and artificial intelligence methods. Artificial Neural Networks (ANN) are included in artificial intelligence methods, which are used in this study for the recognition of ADL. The purpose of this study is the creation of a new method using ANN for the identification of ADL, comparing three types of ANN, in order to achieve results with a reliable accuracy. The best accuracy was obtained with Deep Learning, which, after the application of the L2 regularization and normalization techniques on the sensors data, reports an accuracy of 89.51%.
Tasks	L2 Regularization
Published	2017-10-31
URL	http://arxiv.org/abs/1711.07328v1
PDF	http://arxiv.org/pdf/1711.07328v1.pdf
PWC	https://paperswithcode.com/paper/data-fusion-on-motion-and-magnetic-sensors
Repo
Framework

Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret


Title	Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret
Authors	Alina Beygelzimer, Francesco Orabona, Chicheng Zhang
Abstract	We present an efficient second-order algorithm with $\tilde{O}(\frac{1}{\eta}\sqrt{T})$ regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by $\eta$, for a range of $\eta$ restricted by the norm of the competitor. The family of loss functions ranges from hinge loss ($\eta=0$) to squared hinge loss ($\eta=1$). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for $\sqrt{T}$-regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.
Tasks
Published	2017-02-25
URL	http://arxiv.org/abs/1702.07958v3
PDF	http://arxiv.org/pdf/1702.07958v3.pdf
PWC	https://paperswithcode.com/paper/efficient-online-bandit-multiclass-learning
Repo
Framework

NISP: Pruning Networks using Neuron Importance Score Propagation


Title	NISP: Pruning Networks using Neuron Importance Score Propagation
Authors	Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, Larry S. Davis
Abstract	To reduce the significant redundancy in deep Convolutional Neural Networks (CNNs), most existing methods prune neurons by only considering statistics of an individual layer or two consecutive layers (e.g., prune one layer to minimize the reconstruction error of the next layer), ignoring the effect of error propagation in deep networks. In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the “final response layer” (FRL), which is the second-to-last layer before classification, for a pruned network to retrain its predictive power. Specifically, we apply feature ranking techniques to measure the importance of each neuron in the FRL, and formulate network pruning as a binary integer optimization problem and derive a closed-form solution to it for pruning neurons in earlier layers. Based on our theoretical analysis, we propose the Neuron Importance Score Propagation (NISP) algorithm to propagate the importance scores of final responses to every neuron in the network. The CNN is pruned by removing neurons with least importance, and then fine-tuned to retain its predictive power. NISP is evaluated on several datasets with multiple CNN models and demonstrated to achieve significant acceleration and compression with negligible accuracy loss.
Tasks	Network Pruning
Published	2017-11-16
URL	http://arxiv.org/abs/1711.05908v3
PDF	http://arxiv.org/pdf/1711.05908v3.pdf
PWC	https://paperswithcode.com/paper/nisp-pruning-networks-using-neuron-importance
Repo
Framework

Search Engine Guided Non-Parametric Neural Machine Translation


Title	Search Engine Guided Non-Parametric Neural Machine Translation
Authors	Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O. K. Li
Abstract	In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage–retrieval stage–, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage–translation stage–, a novel translation model, called translation memory enhanced NMT (TM-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved.
Tasks	Machine Translation
Published	2017-05-20
URL	http://arxiv.org/abs/1705.07267v2
PDF	http://arxiv.org/pdf/1705.07267v2.pdf
PWC	https://paperswithcode.com/paper/search-engine-guided-non-parametric-neural
Repo
Framework

Learning to Remember Rare Events


Title	Learning to Remember Rare Events
Authors	Łukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio
Abstract	Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. Our memory module can be easily added to any part of a supervised neural network. To show its versatility we add it to a number of networks, from simple convolutional ones tested on image classification to deep sequence-to-sequence and recurrent-convolutional models. In all cases, the enhanced network gains the ability to remember and do life-long one-shot learning. Our module remembers training examples shown many thousands of steps in the past and it can successfully generalize from them. We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task.
Tasks	Few-Shot Image Classification, Image Classification, Machine Translation, Omniglot, One-Shot Learning
Published	2017-03-09
URL	http://arxiv.org/abs/1703.03129v1
PDF	http://arxiv.org/pdf/1703.03129v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-remember-rare-events
Repo
Framework

On the Challenges of Sentiment Analysis for Dynamic Events


Title	On the Challenges of Sentiment Analysis for Dynamic Events
Authors	Monireh Ebrahimi, Amir Hossein Yazdavar, Amit Sheth
Abstract	With the proliferation of social media over the last decade, determining people’s attitude with respect to a specific topic, document, interaction or events has fueled research interest in natural language processing and introduced a new channel called sentiment and emotion analysis. For instance, businesses routinely look to develop systems to automatically understand their customer conversations by identifying the relevant content to enhance marketing their products and managing their reputations. Previous efforts to assess people’s sentiment on Twitter have suggested that Twitter may be a valuable resource for studying political sentiment and that it reflects the offline political landscape. According to a Pew Research Center report, in January 2016 44 percent of US adults stated having learned about the presidential election through social media. Furthermore, 24 percent reported use of social media posts of the two candidates as a source of news and information, which is more than the 15 percent who have used both candidates’ websites or emails combined. The first presidential debate between Trump and Hillary was the most tweeted debate ever with 17.1 million tweets.
Tasks	Emotion Recognition, Sentiment Analysis
Published	2017-10-06
URL	http://arxiv.org/abs/1710.02514v1
PDF	http://arxiv.org/pdf/1710.02514v1.pdf
PWC	https://paperswithcode.com/paper/on-the-challenges-of-sentiment-analysis-for
Repo
Framework