Paper Group ANR 376
UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss. Understanding and Predicting The Attractiveness of Human Action Shot. Dimensionality Reduction on Grassmannian via Riemannian Optimization: A Generalized Perspective. Automatic Disambiguation of French Discourse Connectives. UTS submission to Google YouTube-8M Challenge …
UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss
Title | UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss |
Authors | Simon Meister, Junhwa Hur, Stefan Roth |
Abstract | In the era of end-to-end deep learning, many advances in computer vision are driven by large amounts of labeled data. In the optical flow setting, however, obtaining dense per-pixel ground truth for real scenes is difficult and thus such data is rare. Therefore, recent end-to-end convolutional networks for optical flow rely on synthetic datasets for supervision, but the domain mismatch between training and test scenarios continues to be a challenge. Inspired by classical energy-based optical flow methods, we design an unsupervised loss based on occlusion-aware bidirectional flow estimation and the robust census transform to circumvent the need for ground truth flow. On the KITTI benchmarks, our unsupervised approach outperforms previous unsupervised deep networks by a large margin, and is even more accurate than similar supervised methods trained on synthetic datasets alone. By optionally fine-tuning on the KITTI training data, our method achieves competitive optical flow accuracy on the KITTI 2012 and 2015 benchmarks, thus in addition enabling generic pre-training of supervised networks for datasets with limited amounts of ground truth. |
Tasks | Optical Flow Estimation |
Published | 2017-11-21 |
URL | http://arxiv.org/abs/1711.07837v1 |
http://arxiv.org/pdf/1711.07837v1.pdf | |
PWC | https://paperswithcode.com/paper/unflow-unsupervised-learning-of-optical-flow |
Repo | |
Framework | |
Understanding and Predicting The Attractiveness of Human Action Shot
Title | Understanding and Predicting The Attractiveness of Human Action Shot |
Authors | Bin Dai, Baoyuan Wang, Gang Hua |
Abstract | Selecting attractive photos from a human action shot sequence is quite challenging, because of the subjective nature of the “attractiveness”, which is mainly a combined factor of human pose in action and the background. Prior works have actively studied high-level image attributes including interestingness, memorability, popularity, and aesthetics. However, none of them has ever studied the “attractiveness” of human action shot. In this paper, we present the first study of the “attractiveness” of human action shots by taking a systematic data-driven approach. Specifically, we create a new action-shot dataset composed of about 8000 high quality action-shot photos. We further conduct rich crowd-sourced human judge studies on Amazon Mechanical Turk(AMT) in terms of global attractiveness of a single photo, and relative attractiveness of a pair of photos. A deep Siamese network with a novel hybrid distribution matching loss was further proposed to fully exploit both types of ratings. Extensive experiments reveal that (1) the property of action shot attractiveness is subjective but predicable (2) our proposed method is both efficient and effective for predicting the attractive human action shots. |
Tasks | |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00677v1 |
http://arxiv.org/pdf/1711.00677v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-and-predicting-the |
Repo | |
Framework | |
Dimensionality Reduction on Grassmannian via Riemannian Optimization: A Generalized Perspective
Title | Dimensionality Reduction on Grassmannian via Riemannian Optimization: A Generalized Perspective |
Authors | Tianci Liu, Zelin Shi, Yunpeng Liu |
Abstract | This paper proposes a generalized framework with joint normalization which learns lower-dimensional subspaces with maximum discriminative power by making use of the Riemannian geometry. In particular, we model the similarity/dissimilarity between subspaces using various metrics defined on Grassmannian and formulate dimen-sionality reduction as a non-linear constraint optimization problem considering the orthogonalization. To obtain the linear mapping, we derive the components required to per-form Riemannian optimization (e.g., Riemannian conju-gate gradient) from the original Grassmannian through an orthonormal projection. We respect the Riemannian ge-ometry of the Grassmann manifold and search for this projection directly from one Grassmann manifold to an-other face-to-face without any additional transformations. In this natural geometry-aware way, any metric on the Grassmann manifold can be resided in our model theoreti-cally. We have combined five metrics with our model and the learning process can be treated as an unconstrained optimization problem on a Grassmann manifold. Exper-iments on several datasets demonstrate that our approach leads to a significant accuracy gain over state-of-the-art methods. |
Tasks | Dimensionality Reduction |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06382v1 |
http://arxiv.org/pdf/1711.06382v1.pdf | |
PWC | https://paperswithcode.com/paper/dimensionality-reduction-on-grassmannian-via |
Repo | |
Framework | |
Automatic Disambiguation of French Discourse Connectives
Title | Automatic Disambiguation of French Discourse Connectives |
Authors | Majid Laali, Leila Kosseim |
Abstract | Discourse connectives (e.g. however, because) are terms that can explicitly convey a discourse relation within a text. While discourse connectives have been shown to be an effective clue to automatically identify discourse relations, they are not always used to convey such relations, thus they should first be disambiguated between discourse-usage non-discourse-usage. In this paper, we investigate the applicability of features proposed for the disambiguation of English discourse connectives for French. Our results with the French Discourse Treebank (FDTB) show that syntactic and lexical features developed for English texts are as effective for French and allow the disambiguation of French discourse connectives with an accuracy of 94.2%. |
Tasks | |
Published | 2017-04-18 |
URL | http://arxiv.org/abs/1704.05162v1 |
http://arxiv.org/pdf/1704.05162v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-disambiguation-of-french-discourse |
Repo | |
Framework | |
UTS submission to Google YouTube-8M Challenge 2017
Title | UTS submission to Google YouTube-8M Challenge 2017 |
Authors | Linchao Zhu, Yanbin Liu, Yi Yang |
Abstract | In this paper, we present our solution to Google YouTube-8M Video Classification Challenge 2017. We leveraged both video-level and frame-level features in the submission. For video-level classification, we simply used a 200-mixture Mixture of Experts (MoE) layer, which achieves GAP 0.802 on the validation set with a single model. For frame-level classification, we utilized several variants of recurrent neural networks, sequence aggregation with attention mechanism and 1D convolutional models. We achieved GAP 0.8408 on the private testing set with the ensemble model. The source code of our models can be found in \url{https://github.com/ffmpbgrnn/yt8m}. |
Tasks | Video Classification |
Published | 2017-07-13 |
URL | http://arxiv.org/abs/1707.04143v1 |
http://arxiv.org/pdf/1707.04143v1.pdf | |
PWC | https://paperswithcode.com/paper/uts-submission-to-google-youtube-8m-challenge |
Repo | |
Framework | |
CAR-Net: Clairvoyant Attentive Recurrent Network
Title | CAR-Net: Clairvoyant Attentive Recurrent Network |
Authors | Amir Sadeghian, Ferdinand Legros, Maxime Voisin, Ricky Vesel, Alexandre Alahi, Silvio Savarese |
Abstract | We present an interpretable framework for path prediction that leverages dependencies between agents’ behaviors and their spatial navigation environment. We exploit two sources of information: the past motion trajectory of the agent of interest and a wide top-view image of the navigation scene. We propose a Clairvoyant Attentive Recurrent Network (CAR-Net) that learns where to look in a large image of the scene when solving the path prediction task. Our method can attend to any area, or combination of areas, within the raw image (e.g., road intersections) when predicting the trajectory of the agent. This allows us to visualize fine-grained semantic elements of navigation scenes that influence the prediction of trajectories. To study the impact of space on agents’ trajectories, we build a new dataset made of top-view images of hundreds of scenes (Formula One racing tracks) where agents’ behaviors are heavily influenced by known areas in the images (e.g., upcoming turns). CAR-Net successfully attends to these salient regions. Additionally, CAR-Net reaches state-of-the-art accuracy on the standard trajectory forecasting benchmark, Stanford Drone Dataset (SDD). Finally, we show CAR-Net’s ability to generalize to unseen scenes. |
Tasks | |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10061v3 |
http://arxiv.org/pdf/1711.10061v3.pdf | |
PWC | https://paperswithcode.com/paper/car-net-clairvoyant-attentive-recurrent |
Repo | |
Framework | |
A generalization of the Jensen divergence: The chord gap divergence
Title | A generalization of the Jensen divergence: The chord gap divergence |
Authors | Frank Nielsen |
Abstract | We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties. It follows a generalization of the celebrated statistical Bhattacharyya distance that is frequently met in applications. We report an iterative concave-convex procedure for computing centroids, and analyze the performance of the $k$-means++ clustering with respect to that new dissimilarity measure by introducing the Taylor-Lagrange remainder form of the skew Jensen divergences. |
Tasks | |
Published | 2017-09-29 |
URL | http://arxiv.org/abs/1709.10498v2 |
http://arxiv.org/pdf/1709.10498v2.pdf | |
PWC | https://paperswithcode.com/paper/a-generalization-of-the-jensen-divergence-the |
Repo | |
Framework | |
Fast and Efficient Skin Detection for Facial Detection
Title | Fast and Efficient Skin Detection for Facial Detection |
Authors | Mohammad Reza Mahmoodi |
Abstract | In this paper, an efficient skin detection system is proposed. The algorithm is based on a very fast efficient pre-processing step utilizing the concept of ternary conversion in order to identify candidate windows and subsequently, a novel local two-stage diffusion method which has F-score accuracy of 0.5978 on SDD dataset. The pre-processing step has been proven to be useful to boost the speed of the system by eliminating 82% of an image in average. This is obtained by keeping the true positive rate above 98%. In addition, a novel segmentation algorithm is also designed to process candidate windows which is quantitatively and qualitatively proven to be very efficient in term of accuracy. The algorithm has been implemented in FPGA to obtain real-time processing speed. The system is designed fully pipeline and the inherent parallel structure of the algorithm is fully exploited to maximize the performance. The system is implemented on a Spartan-6 LXT45 Xilinx FPGA and it is capable of processing 98 frames of 640*480 24-bit color images per second. |
Tasks | |
Published | 2017-01-19 |
URL | http://arxiv.org/abs/1701.05595v1 |
http://arxiv.org/pdf/1701.05595v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-efficient-skin-detection-for-facial |
Repo | |
Framework | |
Application of generative autoencoder in de novo molecular design
Title | Application of generative autoencoder in de novo molecular design |
Authors | Thomas Blaschke, Marcus Olivecrona, Ola Engkvist, Jürgen Bajorath, Hongming Chen |
Abstract | A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the training set were identified. |
Tasks | |
Published | 2017-11-21 |
URL | http://arxiv.org/abs/1711.07839v1 |
http://arxiv.org/pdf/1711.07839v1.pdf | |
PWC | https://paperswithcode.com/paper/application-of-generative-autoencoder-in-de |
Repo | |
Framework | |
Data Fusion on Motion and Magnetic Sensors embedded on Mobile Devices for the Identification of Activities of Daily Living
Title | Data Fusion on Motion and Magnetic Sensors embedded on Mobile Devices for the Identification of Activities of Daily Living |
Authors | Ivan Miguel Pires, Nuno M. Garcia, Nuno Pombo, Francisco Flórez-Revuelta, Susanna Spinsante |
Abstract | Several types of sensors have been available in off-the-shelf mobile devices, including motion, magnetic, vision, acoustic, and location sensors. This paper focuses on the fusion of the data acquired from motion and magnetic sensors, i.e., accelerometer, gyroscope and magnetometer sensors, for the recognition of Activities of Daily Living (ADL) using pattern recognition techniques. The system developed in this study includes data acquisition, data processing, data fusion, and artificial intelligence methods. Artificial Neural Networks (ANN) are included in artificial intelligence methods, which are used in this study for the recognition of ADL. The purpose of this study is the creation of a new method using ANN for the identification of ADL, comparing three types of ANN, in order to achieve results with a reliable accuracy. The best accuracy was obtained with Deep Learning, which, after the application of the L2 regularization and normalization techniques on the sensors data, reports an accuracy of 89.51%. |
Tasks | L2 Regularization |
Published | 2017-10-31 |
URL | http://arxiv.org/abs/1711.07328v1 |
http://arxiv.org/pdf/1711.07328v1.pdf | |
PWC | https://paperswithcode.com/paper/data-fusion-on-motion-and-magnetic-sensors |
Repo | |
Framework | |
Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret
Title | Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret |
Authors | Alina Beygelzimer, Francesco Orabona, Chicheng Zhang |
Abstract | We present an efficient second-order algorithm with $\tilde{O}(\frac{1}{\eta}\sqrt{T})$ regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by $\eta$, for a range of $\eta$ restricted by the norm of the competitor. The family of loss functions ranges from hinge loss ($\eta=0$) to squared hinge loss ($\eta=1$). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for $\sqrt{T}$-regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms. |
Tasks | |
Published | 2017-02-25 |
URL | http://arxiv.org/abs/1702.07958v3 |
http://arxiv.org/pdf/1702.07958v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-online-bandit-multiclass-learning |
Repo | |
Framework | |
NISP: Pruning Networks using Neuron Importance Score Propagation
Title | NISP: Pruning Networks using Neuron Importance Score Propagation |
Authors | Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, Larry S. Davis |
Abstract | To reduce the significant redundancy in deep Convolutional Neural Networks (CNNs), most existing methods prune neurons by only considering statistics of an individual layer or two consecutive layers (e.g., prune one layer to minimize the reconstruction error of the next layer), ignoring the effect of error propagation in deep networks. In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the “final response layer” (FRL), which is the second-to-last layer before classification, for a pruned network to retrain its predictive power. Specifically, we apply feature ranking techniques to measure the importance of each neuron in the FRL, and formulate network pruning as a binary integer optimization problem and derive a closed-form solution to it for pruning neurons in earlier layers. Based on our theoretical analysis, we propose the Neuron Importance Score Propagation (NISP) algorithm to propagate the importance scores of final responses to every neuron in the network. The CNN is pruned by removing neurons with least importance, and then fine-tuned to retain its predictive power. NISP is evaluated on several datasets with multiple CNN models and demonstrated to achieve significant acceleration and compression with negligible accuracy loss. |
Tasks | Network Pruning |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.05908v3 |
http://arxiv.org/pdf/1711.05908v3.pdf | |
PWC | https://paperswithcode.com/paper/nisp-pruning-networks-using-neuron-importance |
Repo | |
Framework | |
Search Engine Guided Non-Parametric Neural Machine Translation
Title | Search Engine Guided Non-Parametric Neural Machine Translation |
Authors | Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O. K. Li |
Abstract | In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage–retrieval stage–, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage–translation stage–, a novel translation model, called translation memory enhanced NMT (TM-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved. |
Tasks | Machine Translation |
Published | 2017-05-20 |
URL | http://arxiv.org/abs/1705.07267v2 |
http://arxiv.org/pdf/1705.07267v2.pdf | |
PWC | https://paperswithcode.com/paper/search-engine-guided-non-parametric-neural |
Repo | |
Framework | |
Learning to Remember Rare Events
Title | Learning to Remember Rare Events |
Authors | Łukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio |
Abstract | Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. Our memory module can be easily added to any part of a supervised neural network. To show its versatility we add it to a number of networks, from simple convolutional ones tested on image classification to deep sequence-to-sequence and recurrent-convolutional models. In all cases, the enhanced network gains the ability to remember and do life-long one-shot learning. Our module remembers training examples shown many thousands of steps in the past and it can successfully generalize from them. We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task. |
Tasks | Few-Shot Image Classification, Image Classification, Machine Translation, Omniglot, One-Shot Learning |
Published | 2017-03-09 |
URL | http://arxiv.org/abs/1703.03129v1 |
http://arxiv.org/pdf/1703.03129v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-remember-rare-events |
Repo | |
Framework | |
On the Challenges of Sentiment Analysis for Dynamic Events
Title | On the Challenges of Sentiment Analysis for Dynamic Events |
Authors | Monireh Ebrahimi, Amir Hossein Yazdavar, Amit Sheth |
Abstract | With the proliferation of social media over the last decade, determining people’s attitude with respect to a specific topic, document, interaction or events has fueled research interest in natural language processing and introduced a new channel called sentiment and emotion analysis. For instance, businesses routinely look to develop systems to automatically understand their customer conversations by identifying the relevant content to enhance marketing their products and managing their reputations. Previous efforts to assess people’s sentiment on Twitter have suggested that Twitter may be a valuable resource for studying political sentiment and that it reflects the offline political landscape. According to a Pew Research Center report, in January 2016 44 percent of US adults stated having learned about the presidential election through social media. Furthermore, 24 percent reported use of social media posts of the two candidates as a source of news and information, which is more than the 15 percent who have used both candidates’ websites or emails combined. The first presidential debate between Trump and Hillary was the most tweeted debate ever with 17.1 million tweets. |
Tasks | Emotion Recognition, Sentiment Analysis |
Published | 2017-10-06 |
URL | http://arxiv.org/abs/1710.02514v1 |
http://arxiv.org/pdf/1710.02514v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-challenges-of-sentiment-analysis-for |
Repo | |
Framework | |