Paper Group AWR 131
Learning Latent Super-Events to Detect Multiple Activities in Videos. NeuralFDR: Learning Discovery Thresholds from Hypothesis Features. Fast Threshold Tests for Detecting Discrimination. DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation. SMSSVD - SubMatrix Selection Singular Value Decomposition. Compressing Word Em …
Learning Latent Super-Events to Detect Multiple Activities in Videos
Title | Learning Latent Super-Events to Detect Multiple Activities in Videos |
Authors | AJ Piergiovanni, Michael S. Ryoo |
Abstract | In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos. We define a super-event as a set of multiple events occurring together in videos with a particular temporal organization; it is the opposite concept of sub-events. Real-world videos contain multiple activities and are rarely segmented (e.g., surveillance videos), and learning latent super-events allows the model to capture how the events are temporally related in videos. We design temporal structure filters that enable the model to focus on particular sub-intervals of the videos, and use them together with a soft attention mechanism to learn representations of latent super-events. Super-event representations are combined with per-frame or per-segment CNNs to provide frame-level annotations. Our approach is designed to be fully differentiable, enabling end-to-end learning of latent super-event representations jointly with the activity detector using them. Our experiments with multiple public video datasets confirm that the proposed concept of latent super-event learning significantly benefits activity detection, advancing the state-of-the-arts. |
Tasks | Action Detection, Activity Detection |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01938v2 |
http://arxiv.org/pdf/1712.01938v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-latent-super-events-to-detect |
Repo | https://github.com/piergiaj/super-events-cvpr18 |
Framework | pytorch |
NeuralFDR: Learning Discovery Thresholds from Hypothesis Features
Title | NeuralFDR: Learning Discovery Thresholds from Hypothesis Features |
Authors | Fei Xia, Martin J. Zhang, James Zou, David Tse |
Abstract | As datasets grow richer, an important challenge is to leverage the full features in the data to maximize the number of useful discoveries while controlling for false positives. We address this problem in the context of multiple hypotheses testing, where for each hypothesis, we observe a p-value along with a set of features specific to that hypothesis. For example, in genetic association studies, each hypothesis tests the correlation between a variant and the trait. We have a rich set of features for each variant (e.g. its location, conservation, epigenetics etc.) which could inform how likely the variant is to have a true association. However popular testing approaches, such as Benjamini-Hochberg’s procedure (BH) and independent hypothesis weighting (IHW), either ignore these features or assume that the features are categorical or uni-variate. We propose a new algorithm, NeuralFDR, which automatically learns a discovery threshold as a function of all the hypothesis features. We parametrize the discovery threshold as a neural network, which enables flexible handling of multi-dimensional discrete and continuous features as well as efficient end-to-end optimization. We prove that NeuralFDR has strong false discovery rate (FDR) guarantees, and show that it makes substantially more discoveries in synthetic and real datasets. Moreover, we demonstrate that the learned discovery threshold is directly interpretable. |
Tasks | |
Published | 2017-11-03 |
URL | http://arxiv.org/abs/1711.01312v4 |
http://arxiv.org/pdf/1711.01312v4.pdf | |
PWC | https://paperswithcode.com/paper/neuralfdr-learning-discovery-thresholds-from |
Repo | https://github.com/fxia22/NeuralFDR |
Framework | pytorch |
Fast Threshold Tests for Detecting Discrimination
Title | Fast Threshold Tests for Detecting Discrimination |
Authors | Emma Pierson, Sam Corbett-Davies, Sharad Goel |
Abstract | Threshold tests have recently been proposed as a useful method for detecting bias in lending, hiring, and policing decisions. For example, in the case of credit extensions, these tests aim to estimate the bar for granting loans to white and minority applicants, with a higher inferred threshold for minorities indicative of discrimination. This technique, however, requires fitting a complex Bayesian latent variable model for which inference is often computationally challenging. Here we develop a method for fitting threshold tests that is two orders of magnitude faster than the existing approach, reducing computation from hours to minutes. To achieve these performance gains, we introduce and analyze a flexible family of probability distributions on the interval [0, 1] – which we call discriminant distributions – that is computationally efficient to work with. We demonstrate our technique by analyzing 2.7 million police stops of pedestrians in New York City. |
Tasks | |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08536v3 |
http://arxiv.org/pdf/1702.08536v3.pdf | |
PWC | https://paperswithcode.com/paper/fast-threshold-tests-for-detecting |
Repo | https://github.com/5harad/fasttt |
Framework | none |
DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation
Title | DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation |
Authors | Ruirui Li, Wenjie Liu, Lei Yang, Shihao Sun, Wei Hu, Fan Zhang, Wei Li |
Abstract | Semantic segmentation is a fundamental research in remote sensing image processing. Because of the complex maritime environment, the sea-land segmentation is a challenging task. Although the neural network has achieved excellent performance in semantic segmentation in the last years, there are a few of works using CNN for sea-land segmentation and the results could be further improved. This paper proposes a novel deep convolution neural network named DeepUNet. Like the U-Net, its structure has a contracting path and an expansive path to get high resolution output. But differently, the DeepUNet uses DownBlocks instead of convolution layers in the contracting path and uses UpBlock in the expansive path. The two novel blocks bring two new connections that are U-connection and Plus connection. They are promoted to get more precise segmentation results. To verify our network architecture, we made a new challenging sea-land dataset and compare the DeepUNet on it with the SegNet and the U-Net. Experimental results show that DeepUNet achieved good performance compared with other architectures, especially in high-resolution remote sensing imagery. |
Tasks | Semantic Segmentation |
Published | 2017-09-01 |
URL | http://arxiv.org/abs/1709.00201v1 |
http://arxiv.org/pdf/1709.00201v1.pdf | |
PWC | https://paperswithcode.com/paper/deepunet-a-deep-fully-convolutional-network |
Repo | https://github.com/Lkruitwagen/remote-sensing-solar-pv |
Framework | none |
SMSSVD - SubMatrix Selection Singular Value Decomposition
Title | SMSSVD - SubMatrix Selection Singular Value Decomposition |
Authors | Rasmus Henningsson, Magnus Fontes |
Abstract | High throughput biomedical measurements normally capture multiple overlaid biologically relevant signals and often also signals representing different types of technical artefacts like e.g. batch effects. Signal identification and decomposition are accordingly main objectives in statistical biomedical modeling and data analysis. Existing methods, aimed at signal reconstruction and deconvolution, in general, are either supervised, contain parameters that need to be estimated or present other types of ad hoc features. We here introduce SubMatrix Selection SingularValue Decomposition (SMSSVD), a parameter-free unsupervised signal decomposition and dimension reduction method, designed to reduce noise, adaptively for each low-rank-signal in a given data matrix, and represent the signals in the data in a way that enable unbiased exploratory analysis and reconstruction of multiple overlaid signals, including identifying groups of variables that drive different signals. The Submatrix Selection Singular Value Decomposition (SMSSVD) method produces a denoised signal decomposition from a given data matrix. The SMSSVD method guarantees orthogonality between signal components in a straightforward manner and it is designed to make automation possible. We illustrate SMSSVD by applying it to several real and synthetic datasets and compare its performance to golden standard methods like PCA (Principal Component Analysis) and SPC (Sparse Principal Components, using Lasso constraints). The SMSSVD is computationally efficient and despite being a parameter-free method, in general, outperforms existing statistical learning methods. A Julia implementation of SMSSVD is openly available on GitHub (https://github.com/rasmushenningsson/SMSSVD.jl). |
Tasks | Dimensionality Reduction |
Published | 2017-10-23 |
URL | http://arxiv.org/abs/1710.08144v1 |
http://arxiv.org/pdf/1710.08144v1.pdf | |
PWC | https://paperswithcode.com/paper/smssvd-submatrix-selection-singular-value |
Repo | https://github.com/rasmushenningsson/SMSSVD.jl |
Framework | none |
Compressing Word Embeddings via Deep Compositional Code Learning
Title | Compressing Word Embeddings via Deep Compositional Code Learning |
Authors | Raphael Shu, Hideki Nakayama |
Abstract | Natural language processing (NLP) models often require a massive number of parameters for word embeddings, resulting in a large storage or memory footprint. Deploying neural NLP models to mobile devices requires compressing the word embeddings without any significant sacrifices in performance. For this purpose, we propose to construct the embeddings with few basis vectors. For each word, the composition of basis vectors is determined by a hash code. To maximize the compression rate, we adopt the multi-codebook quantization approach instead of binary coding scheme. Each code is composed of multiple discrete numbers, such as (3, 2, 1, 8), where the value of each component is limited to a fixed range. We propose to directly learn the discrete codes in an end-to-end neural network by applying the Gumbel-softmax trick. Experiments show the compression rate achieves 98% in a sentiment analysis task and 94% ~ 99% in machine translation tasks without performance loss. In both tasks, the proposed method can improve the model performance by slightly lowering the compression rate. Compared to other approaches such as character-level segmentation, the proposed method is language-independent and does not require modifications to the network architecture. |
Tasks | Machine Translation, Quantization, Sentiment Analysis, Word Embeddings |
Published | 2017-11-03 |
URL | http://arxiv.org/abs/1711.01068v2 |
http://arxiv.org/pdf/1711.01068v2.pdf | |
PWC | https://paperswithcode.com/paper/compressing-word-embeddings-via-deep |
Repo | https://github.com/zomux/neuralcompressor |
Framework | tf |
Towards Good Practices for Deep 3D Hand Pose Estimation
Title | Towards Good Practices for Deep 3D Hand Pose Estimation |
Authors | Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang |
Abstract | 3D hand pose estimation from single depth image is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional random forest based methods is not so apparent. To exploit the good practice and promote the performance for hand pose estimation, we propose a tree-structured Region Ensemble Network (REN) for directly 3D coordinate regression. It first partitions the last convolution outputs of ConvNet into several grid regions. The results from separate fully-connected (FC) regressors on each regions are then integrated by another FC layer to perform the estimation. By exploitation of several training strategies including data augmentation and smooth $L_1$ loss, proposed REN can significantly improve the performance of ConvNet to localize hand joints. The experimental results demonstrate that our approach achieves the best performance among state-of-the-art algorithms on three public hand pose datasets. We also experiment our methods on fingertip detection and human pose datasets and obtain state-of-the-art accuracy. |
Tasks | Data Augmentation, Hand Pose Estimation, Pose Estimation |
Published | 2017-07-23 |
URL | http://arxiv.org/abs/1707.07248v1 |
http://arxiv.org/pdf/1707.07248v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-good-practices-for-deep-3d-hand-pose |
Repo | https://github.com/guohengkai/region-ensemble-network |
Framework | none |
Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model
Title | Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model |
Authors | Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo |
Abstract | With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the Convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that deep learning models have a huge potential for solving the problem. However, the convolutional recurrence structure in ConvLSTM-based models is location-invariant while natural motion and transformation (e.g., rotation) are location-variant in general. Furthermore, since deep-learning-based precipitation nowcasting is a newly emerging area, clear evaluation protocols have not yet been established. To address these problems, we propose both a new model and a benchmark for precipitation nowcasting. Specifically, we go beyond ConvLSTM and propose the Trajectory GRU (TrajGRU) model that can actively learn the location-variant structure for recurrent connections. Besides, we provide a benchmark that includes a real-world large-scale dataset from the Hong Kong Observatory, a new training loss, and a comprehensive evaluation protocol to facilitate future research and gauge the state of the art. |
Tasks | Optical Flow Estimation |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03458v2 |
http://arxiv.org/pdf/1706.03458v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-precipitation-nowcasting-a |
Repo | https://github.com/tianhai123/conv-gru |
Framework | pytorch |
Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Title | Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting |
Authors | Yaguang Li, Rose Yu, Cyrus Shahabi, Yan Liu |
Abstract | Spatiotemporal forecasting has various applications in neuroscience, climate and transportation domain. Traffic forecasting is one canonical example of such learning task. The task is challenging due to (1) complex spatial dependency on road networks, (2) non-linear temporal dynamics with changing road conditions and (3) inherent difficulty of long-term forecasting. To address these challenges, we propose to model the traffic flow as a diffusion process on a directed graph and introduce Diffusion Convolutional Recurrent Neural Network (DCRNN), a deep learning framework for traffic forecasting that incorporates both spatial and temporal dependency in the traffic flow. Specifically, DCRNN captures the spatial dependency using bidirectional random walks on the graph, and the temporal dependency using the encoder-decoder architecture with scheduled sampling. We evaluate the framework on two real-world large scale road network traffic datasets and observe consistent improvement of 12% - 15% over state-of-the-art baselines. |
Tasks | Multivariate Time Series Forecasting, Spatio-Temporal Forecasting, Time Series Forecasting, Time Series Prediction, Traffic Prediction |
Published | 2017-07-06 |
URL | http://arxiv.org/abs/1707.01926v3 |
http://arxiv.org/pdf/1707.01926v3.pdf | |
PWC | https://paperswithcode.com/paper/diffusion-convolutional-recurrent-neural |
Repo | https://github.com/liyaguang/DCRNN |
Framework | tf |
Simple Cortex: A Model of Cells in the Sensory Nervous System
Title | Simple Cortex: A Model of Cells in the Sensory Nervous System |
Authors | David Di Giorgio |
Abstract | Neuroscience research has produced many theories and computational neural models of sensory nervous systems. Notwithstanding many different perspectives towards developing intelligent machines, artificial intelligence has ultimately been influenced by neuroscience. Therefore, this paper provides an introduction to biologically inspired machine intelligence by exploring the basic principles of sensation and perception as well as the structure and behavior of biological sensory nervous systems like the neocortex. Concepts like spike timing, synaptic plasticity, inhibition, neural structure, and neural behavior are applied to a new model, Simple Cortex (SC). A software implementation of SC has been built and demonstrates fast observation, learning, and prediction of spatio-temporal sensory-motor patterns and sequences. Finally, this paper suggests future areas of improvement and growth for Simple Cortex and other related machine intelligence models. |
Tasks | |
Published | 2017-10-03 |
URL | http://arxiv.org/abs/1710.01347v1 |
http://arxiv.org/pdf/1710.01347v1.pdf | |
PWC | https://paperswithcode.com/paper/simple-cortex-a-model-of-cells-in-the-sensory |
Repo | https://github.com/ddigiorg/simple-cortex |
Framework | none |
Semantic Instance Segmentation via Deep Metric Learning
Title | Semantic Instance Segmentation via Deep Metric Learning |
Authors | Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin P. Murphy |
Abstract | We propose a new method for semantic instance segmentation, by first computing how likely two pixels are to belong to the same object, and then by grouping similar pixels together. Our similarity metric is based on a deep, fully convolutional embedding model. Our grouping method is based on selecting all points that are sufficiently similar to a set of “seed points”, chosen from a deep, fully convolutional scoring model. We show competitive results on the Pascal VOC instance segmentation benchmark. |
Tasks | Instance Segmentation, Metric Learning, Object Proposal Generation, Semantic Segmentation |
Published | 2017-03-30 |
URL | http://arxiv.org/abs/1703.10277v1 |
http://arxiv.org/pdf/1703.10277v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-instance-segmentation-via-deep |
Repo | https://github.com/alicranck/instance-seg |
Framework | pytorch |
READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents
Title | READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents |
Authors | Tobias Grüning, Roger Labahn, Markus Diem, Florian Kleber, Stefan Fiel |
Abstract | Text line detection is crucial for any application associated with Automatic Text Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data or simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations and time periods. The dataset contains varying page layouts and degradations that challenge text line segmentation methods. Well established text line segmentation evaluation schemes such as the Detection Rate or Recognition Accuracy demand for binarized data that is annotated on a pixel level. Producing ground truth by these means is laborious and not needed to determine a method’s quality. In this paper we propose a new evaluation scheme that is based on baselines. The proposed scheme has no need for binarization and it can handle skewed as well as rotated text lines. The ICDAR 2017 Competition on Baseline Detection and the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts used this evaluation scheme. Finally, we present results achieved by a recently published text line detection algorithm. |
Tasks | Keyword Spotting |
Published | 2017-05-09 |
URL | http://arxiv.org/abs/1705.03311v2 |
http://arxiv.org/pdf/1705.03311v2.pdf | |
PWC | https://paperswithcode.com/paper/read-bad-a-new-dataset-and-evaluation-scheme |
Repo | https://github.com/Transkribus/TranskribusBaseLineEvaluationScheme |
Framework | none |
Kafnets: kernel-based non-parametric activation functions for neural networks
Title | Kafnets: kernel-based non-parametric activation functions for neural networks |
Authors | Simone Scardapane, Steven Van Vaerenbergh, Simone Totaro, Aurelio Uncini |
Abstract | Neural networks are generally built by interleaving (adaptable) linear layers with (fixed) nonlinear activation functions. To increase their flexibility, several authors have proposed methods for adapting the activation functions themselves, endowing them with varying degrees of flexibility. None of these approaches, however, have gained wide acceptance in practice, and research in this topic remains open. In this paper, we introduce a novel family of flexible activation functions that are based on an inexpensive kernel expansion at every neuron. Leveraging over several properties of kernel-based models, we propose multiple variations for designing and initializing these kernel activation functions (KAFs), including a multidimensional scheme allowing to nonlinearly combine information from different paths in the network. The resulting KAFs can approximate any mapping defined over a subset of the real line, either convex or nonconvex. Furthermore, they are smooth over their entire domain, linear in their parameters, and they can be regularized using any known scheme, including the use of $\ell_1$ penalties to enforce sparseness. To the best of our knowledge, no other known model satisfies all these properties simultaneously. In addition, we provide a relatively complete overview on alternative techniques for adapting the activation functions, which is currently lacking in the literature. A large set of experiments validates our proposal. |
Tasks | |
Published | 2017-07-13 |
URL | http://arxiv.org/abs/1707.04035v2 |
http://arxiv.org/pdf/1707.04035v2.pdf | |
PWC | https://paperswithcode.com/paper/kafnets-kernel-based-non-parametric |
Repo | https://github.com/ispamm/kernel-activation-functions |
Framework | pytorch |
Dynamic Evaluation of Neural Sequence Models
Title | Dynamic Evaluation of Neural Sequence Models |
Authors | Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals |
Abstract | We present methodology for using dynamic evaluation to improve neural sequence models. Models are adapted to recent history via a gradient descent based mechanism, causing them to assign higher probabilities to re-occurring sequential patterns. Dynamic evaluation outperforms existing adaptation approaches in our comparisons. Dynamic evaluation improves the state-of-the-art word-level perplexities on the Penn Treebank and WikiText-2 datasets to 51.1 and 44.3 respectively, and the state-of-the-art character-level cross-entropies on the text8 and Hutter Prize datasets to 1.19 bits/char and 1.08 bits/char respectively. |
Tasks | Language Modelling |
Published | 2017-09-21 |
URL | http://arxiv.org/abs/1709.07432v2 |
http://arxiv.org/pdf/1709.07432v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-evaluation-of-neural-sequence-models |
Repo | https://github.com/benkrause/dynamiceval-transformer |
Framework | tf |
Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
Title | Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video |
Authors | Mohammad Javad Shafiee, Brendan Chywl, Francis Li, Alexander Wong |
Abstract | Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system. |
Tasks | Object Classification, Object Detection, Object Localization, Real-Time Object Detection |
Published | 2017-09-18 |
URL | http://arxiv.org/abs/1709.05943v1 |
http://arxiv.org/pdf/1709.05943v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-yolo-a-fast-you-only-look-once-system |
Repo | https://github.com/Vonski/wdi19 |
Framework | none |