July 29, 2019

3281 words 16 mins read

Paper Group AWR 131

Learning Latent Super-Events to Detect Multiple Activities in Videos. NeuralFDR: Learning Discovery Thresholds from Hypothesis Features. Fast Threshold Tests for Detecting Discrimination. DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation. SMSSVD - SubMatrix Selection Singular Value Decomposition. Compressing Word Em …

Learning Latent Super-Events to Detect Multiple Activities in Videos


Title	Learning Latent Super-Events to Detect Multiple Activities in Videos
Authors	AJ Piergiovanni, Michael S. Ryoo
Abstract	In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos. We define a super-event as a set of multiple events occurring together in videos with a particular temporal organization; it is the opposite concept of sub-events. Real-world videos contain multiple activities and are rarely segmented (e.g., surveillance videos), and learning latent super-events allows the model to capture how the events are temporally related in videos. We design temporal structure filters that enable the model to focus on particular sub-intervals of the videos, and use them together with a soft attention mechanism to learn representations of latent super-events. Super-event representations are combined with per-frame or per-segment CNNs to provide frame-level annotations. Our approach is designed to be fully differentiable, enabling end-to-end learning of latent super-event representations jointly with the activity detector using them. Our experiments with multiple public video datasets confirm that the proposed concept of latent super-event learning significantly benefits activity detection, advancing the state-of-the-arts.
Tasks	Action Detection, Activity Detection
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01938v2
PDF	http://arxiv.org/pdf/1712.01938v2.pdf
PWC	https://paperswithcode.com/paper/learning-latent-super-events-to-detect
Repo	https://github.com/piergiaj/super-events-cvpr18
Framework	pytorch

NeuralFDR: Learning Discovery Thresholds from Hypothesis Features


Title	NeuralFDR: Learning Discovery Thresholds from Hypothesis Features
Authors	Fei Xia, Martin J. Zhang, James Zou, David Tse
Abstract	As datasets grow richer, an important challenge is to leverage the full features in the data to maximize the number of useful discoveries while controlling for false positives. We address this problem in the context of multiple hypotheses testing, where for each hypothesis, we observe a p-value along with a set of features specific to that hypothesis. For example, in genetic association studies, each hypothesis tests the correlation between a variant and the trait. We have a rich set of features for each variant (e.g. its location, conservation, epigenetics etc.) which could inform how likely the variant is to have a true association. However popular testing approaches, such as Benjamini-Hochberg’s procedure (BH) and independent hypothesis weighting (IHW), either ignore these features or assume that the features are categorical or uni-variate. We propose a new algorithm, NeuralFDR, which automatically learns a discovery threshold as a function of all the hypothesis features. We parametrize the discovery threshold as a neural network, which enables flexible handling of multi-dimensional discrete and continuous features as well as efficient end-to-end optimization. We prove that NeuralFDR has strong false discovery rate (FDR) guarantees, and show that it makes substantially more discoveries in synthetic and real datasets. Moreover, we demonstrate that the learned discovery threshold is directly interpretable.
Tasks
Published	2017-11-03
URL	http://arxiv.org/abs/1711.01312v4
PDF	http://arxiv.org/pdf/1711.01312v4.pdf
PWC	https://paperswithcode.com/paper/neuralfdr-learning-discovery-thresholds-from
Repo	https://github.com/fxia22/NeuralFDR
Framework	pytorch

Fast Threshold Tests for Detecting Discrimination


Title	Fast Threshold Tests for Detecting Discrimination
Authors	Emma Pierson, Sam Corbett-Davies, Sharad Goel
Abstract	Threshold tests have recently been proposed as a useful method for detecting bias in lending, hiring, and policing decisions. For example, in the case of credit extensions, these tests aim to estimate the bar for granting loans to white and minority applicants, with a higher inferred threshold for minorities indicative of discrimination. This technique, however, requires fitting a complex Bayesian latent variable model for which inference is often computationally challenging. Here we develop a method for fitting threshold tests that is two orders of magnitude faster than the existing approach, reducing computation from hours to minutes. To achieve these performance gains, we introduce and analyze a flexible family of probability distributions on the interval [0, 1] – which we call discriminant distributions – that is computationally efficient to work with. We demonstrate our technique by analyzing 2.7 million police stops of pedestrians in New York City.
Tasks
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08536v3
PDF	http://arxiv.org/pdf/1702.08536v3.pdf
PWC	https://paperswithcode.com/paper/fast-threshold-tests-for-detecting
Repo	https://github.com/5harad/fasttt
Framework	none

DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation


Title	DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation
Authors	Ruirui Li, Wenjie Liu, Lei Yang, Shihao Sun, Wei Hu, Fan Zhang, Wei Li
Abstract	Semantic segmentation is a fundamental research in remote sensing image processing. Because of the complex maritime environment, the sea-land segmentation is a challenging task. Although the neural network has achieved excellent performance in semantic segmentation in the last years, there are a few of works using CNN for sea-land segmentation and the results could be further improved. This paper proposes a novel deep convolution neural network named DeepUNet. Like the U-Net, its structure has a contracting path and an expansive path to get high resolution output. But differently, the DeepUNet uses DownBlocks instead of convolution layers in the contracting path and uses UpBlock in the expansive path. The two novel blocks bring two new connections that are U-connection and Plus connection. They are promoted to get more precise segmentation results. To verify our network architecture, we made a new challenging sea-land dataset and compare the DeepUNet on it with the SegNet and the U-Net. Experimental results show that DeepUNet achieved good performance compared with other architectures, especially in high-resolution remote sensing imagery.
Tasks	Semantic Segmentation
Published	2017-09-01
URL	http://arxiv.org/abs/1709.00201v1
PDF	http://arxiv.org/pdf/1709.00201v1.pdf
PWC	https://paperswithcode.com/paper/deepunet-a-deep-fully-convolutional-network
Repo	https://github.com/Lkruitwagen/remote-sensing-solar-pv
Framework	none

SMSSVD - SubMatrix Selection Singular Value Decomposition


Title	SMSSVD - SubMatrix Selection Singular Value Decomposition
Authors	Rasmus Henningsson, Magnus Fontes
Abstract	High throughput biomedical measurements normally capture multiple overlaid biologically relevant signals and often also signals representing different types of technical artefacts like e.g. batch effects. Signal identification and decomposition are accordingly main objectives in statistical biomedical modeling and data analysis. Existing methods, aimed at signal reconstruction and deconvolution, in general, are either supervised, contain parameters that need to be estimated or present other types of ad hoc features. We here introduce SubMatrix Selection SingularValue Decomposition (SMSSVD), a parameter-free unsupervised signal decomposition and dimension reduction method, designed to reduce noise, adaptively for each low-rank-signal in a given data matrix, and represent the signals in the data in a way that enable unbiased exploratory analysis and reconstruction of multiple overlaid signals, including identifying groups of variables that drive different signals. The Submatrix Selection Singular Value Decomposition (SMSSVD) method produces a denoised signal decomposition from a given data matrix. The SMSSVD method guarantees orthogonality between signal components in a straightforward manner and it is designed to make automation possible. We illustrate SMSSVD by applying it to several real and synthetic datasets and compare its performance to golden standard methods like PCA (Principal Component Analysis) and SPC (Sparse Principal Components, using Lasso constraints). The SMSSVD is computationally efficient and despite being a parameter-free method, in general, outperforms existing statistical learning methods. A Julia implementation of SMSSVD is openly available on GitHub (https://github.com/rasmushenningsson/SMSSVD.jl).
Tasks	Dimensionality Reduction
Published	2017-10-23
URL	http://arxiv.org/abs/1710.08144v1
PDF	http://arxiv.org/pdf/1710.08144v1.pdf
PWC	https://paperswithcode.com/paper/smssvd-submatrix-selection-singular-value
Repo	https://github.com/rasmushenningsson/SMSSVD.jl
Framework	none

Compressing Word Embeddings via Deep Compositional Code Learning


Title	Compressing Word Embeddings via Deep Compositional Code Learning
Authors	Raphael Shu, Hideki Nakayama
Abstract	Natural language processing (NLP) models often require a massive number of parameters for word embeddings, resulting in a large storage or memory footprint. Deploying neural NLP models to mobile devices requires compressing the word embeddings without any significant sacrifices in performance. For this purpose, we propose to construct the embeddings with few basis vectors. For each word, the composition of basis vectors is determined by a hash code. To maximize the compression rate, we adopt the multi-codebook quantization approach instead of binary coding scheme. Each code is composed of multiple discrete numbers, such as (3, 2, 1, 8), where the value of each component is limited to a fixed range. We propose to directly learn the discrete codes in an end-to-end neural network by applying the Gumbel-softmax trick. Experiments show the compression rate achieves 98% in a sentiment analysis task and 94% ~ 99% in machine translation tasks without performance loss. In both tasks, the proposed method can improve the model performance by slightly lowering the compression rate. Compared to other approaches such as character-level segmentation, the proposed method is language-independent and does not require modifications to the network architecture.
Tasks	Machine Translation, Quantization, Sentiment Analysis, Word Embeddings
Published	2017-11-03
URL	http://arxiv.org/abs/1711.01068v2
PDF	http://arxiv.org/pdf/1711.01068v2.pdf
PWC	https://paperswithcode.com/paper/compressing-word-embeddings-via-deep
Repo	https://github.com/zomux/neuralcompressor
Framework	tf

Towards Good Practices for Deep 3D Hand Pose Estimation


Title	Towards Good Practices for Deep 3D Hand Pose Estimation
Authors	Hengkai Guo, Guijin Wang, Xinghao Chen, Cairong Zhang
Abstract	3D hand pose estimation from single depth image is an important and challenging problem for human-computer interaction. Recently deep convolutional networks (ConvNet) with sophisticated design have been employed to address it, but the improvement over traditional random forest based methods is not so apparent. To exploit the good practice and promote the performance for hand pose estimation, we propose a tree-structured Region Ensemble Network (REN) for directly 3D coordinate regression. It first partitions the last convolution outputs of ConvNet into several grid regions. The results from separate fully-connected (FC) regressors on each regions are then integrated by another FC layer to perform the estimation. By exploitation of several training strategies including data augmentation and smooth $L_1$ loss, proposed REN can significantly improve the performance of ConvNet to localize hand joints. The experimental results demonstrate that our approach achieves the best performance among state-of-the-art algorithms on three public hand pose datasets. We also experiment our methods on fingertip detection and human pose datasets and obtain state-of-the-art accuracy.
Tasks	Data Augmentation, Hand Pose Estimation, Pose Estimation
Published	2017-07-23
URL	http://arxiv.org/abs/1707.07248v1
PDF	http://arxiv.org/pdf/1707.07248v1.pdf
PWC	https://paperswithcode.com/paper/towards-good-practices-for-deep-3d-hand-pose
Repo	https://github.com/guohengkai/region-ensemble-network
Framework	none

Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model


Title	Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model
Authors	Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, Wang-chun Woo
Abstract	With the goal of making high-resolution forecasts of regional rainfall, precipitation nowcasting has become an important and fundamental technology underlying various public services ranging from rainstorm warnings to flight safety. Recently, the Convolutional LSTM (ConvLSTM) model has been shown to outperform traditional optical flow based methods for precipitation nowcasting, suggesting that deep learning models have a huge potential for solving the problem. However, the convolutional recurrence structure in ConvLSTM-based models is location-invariant while natural motion and transformation (e.g., rotation) are location-variant in general. Furthermore, since deep-learning-based precipitation nowcasting is a newly emerging area, clear evaluation protocols have not yet been established. To address these problems, we propose both a new model and a benchmark for precipitation nowcasting. Specifically, we go beyond ConvLSTM and propose the Trajectory GRU (TrajGRU) model that can actively learn the location-variant structure for recurrent connections. Besides, we provide a benchmark that includes a real-world large-scale dataset from the Hong Kong Observatory, a new training loss, and a comprehensive evaluation protocol to facilitate future research and gauge the state of the art.
Tasks	Optical Flow Estimation
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03458v2
PDF	http://arxiv.org/pdf/1706.03458v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-precipitation-nowcasting-a
Repo	https://github.com/tianhai123/conv-gru
Framework	pytorch

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting


Title	Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Authors	Yaguang Li, Rose Yu, Cyrus Shahabi, Yan Liu
Abstract	Spatiotemporal forecasting has various applications in neuroscience, climate and transportation domain. Traffic forecasting is one canonical example of such learning task. The task is challenging due to (1) complex spatial dependency on road networks, (2) non-linear temporal dynamics with changing road conditions and (3) inherent difficulty of long-term forecasting. To address these challenges, we propose to model the traffic flow as a diffusion process on a directed graph and introduce Diffusion Convolutional Recurrent Neural Network (DCRNN), a deep learning framework for traffic forecasting that incorporates both spatial and temporal dependency in the traffic flow. Specifically, DCRNN captures the spatial dependency using bidirectional random walks on the graph, and the temporal dependency using the encoder-decoder architecture with scheduled sampling. We evaluate the framework on two real-world large scale road network traffic datasets and observe consistent improvement of 12% - 15% over state-of-the-art baselines.
Tasks	Multivariate Time Series Forecasting, Spatio-Temporal Forecasting, Time Series Forecasting, Time Series Prediction, Traffic Prediction
Published	2017-07-06
URL	http://arxiv.org/abs/1707.01926v3
PDF	http://arxiv.org/pdf/1707.01926v3.pdf
PWC	https://paperswithcode.com/paper/diffusion-convolutional-recurrent-neural
Repo	https://github.com/liyaguang/DCRNN
Framework	tf

Simple Cortex: A Model of Cells in the Sensory Nervous System


Title	Simple Cortex: A Model of Cells in the Sensory Nervous System
Authors	David Di Giorgio
Abstract	Neuroscience research has produced many theories and computational neural models of sensory nervous systems. Notwithstanding many different perspectives towards developing intelligent machines, artificial intelligence has ultimately been influenced by neuroscience. Therefore, this paper provides an introduction to biologically inspired machine intelligence by exploring the basic principles of sensation and perception as well as the structure and behavior of biological sensory nervous systems like the neocortex. Concepts like spike timing, synaptic plasticity, inhibition, neural structure, and neural behavior are applied to a new model, Simple Cortex (SC). A software implementation of SC has been built and demonstrates fast observation, learning, and prediction of spatio-temporal sensory-motor patterns and sequences. Finally, this paper suggests future areas of improvement and growth for Simple Cortex and other related machine intelligence models.
Tasks
Published	2017-10-03
URL	http://arxiv.org/abs/1710.01347v1
PDF	http://arxiv.org/pdf/1710.01347v1.pdf
PWC	https://paperswithcode.com/paper/simple-cortex-a-model-of-cells-in-the-sensory
Repo	https://github.com/ddigiorg/simple-cortex
Framework	none

Semantic Instance Segmentation via Deep Metric Learning


Title	Semantic Instance Segmentation via Deep Metric Learning
Authors	Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin P. Murphy
Abstract	We propose a new method for semantic instance segmentation, by first computing how likely two pixels are to belong to the same object, and then by grouping similar pixels together. Our similarity metric is based on a deep, fully convolutional embedding model. Our grouping method is based on selecting all points that are sufficiently similar to a set of “seed points”, chosen from a deep, fully convolutional scoring model. We show competitive results on the Pascal VOC instance segmentation benchmark.
Tasks	Instance Segmentation, Metric Learning, Object Proposal Generation, Semantic Segmentation
Published	2017-03-30
URL	http://arxiv.org/abs/1703.10277v1
PDF	http://arxiv.org/pdf/1703.10277v1.pdf
PWC	https://paperswithcode.com/paper/semantic-instance-segmentation-via-deep
Repo	https://github.com/alicranck/instance-seg
Framework	pytorch

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents


Title	READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents
Authors	Tobias Grüning, Roger Labahn, Markus Diem, Florian Kleber, Stefan Fiel
Abstract	Text line detection is crucial for any application associated with Automatic Text Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data or simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations and time periods. The dataset contains varying page layouts and degradations that challenge text line segmentation methods. Well established text line segmentation evaluation schemes such as the Detection Rate or Recognition Accuracy demand for binarized data that is annotated on a pixel level. Producing ground truth by these means is laborious and not needed to determine a method’s quality. In this paper we propose a new evaluation scheme that is based on baselines. The proposed scheme has no need for binarization and it can handle skewed as well as rotated text lines. The ICDAR 2017 Competition on Baseline Detection and the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts used this evaluation scheme. Finally, we present results achieved by a recently published text line detection algorithm.
Tasks	Keyword Spotting
Published	2017-05-09
URL	http://arxiv.org/abs/1705.03311v2
PDF	http://arxiv.org/pdf/1705.03311v2.pdf
PWC	https://paperswithcode.com/paper/read-bad-a-new-dataset-and-evaluation-scheme
Repo	https://github.com/Transkribus/TranskribusBaseLineEvaluationScheme
Framework	none

Kafnets: kernel-based non-parametric activation functions for neural networks


Title	Kafnets: kernel-based non-parametric activation functions for neural networks
Authors	Simone Scardapane, Steven Van Vaerenbergh, Simone Totaro, Aurelio Uncini
Abstract	Neural networks are generally built by interleaving (adaptable) linear layers with (fixed) nonlinear activation functions. To increase their flexibility, several authors have proposed methods for adapting the activation functions themselves, endowing them with varying degrees of flexibility. None of these approaches, however, have gained wide acceptance in practice, and research in this topic remains open. In this paper, we introduce a novel family of flexible activation functions that are based on an inexpensive kernel expansion at every neuron. Leveraging over several properties of kernel-based models, we propose multiple variations for designing and initializing these kernel activation functions (KAFs), including a multidimensional scheme allowing to nonlinearly combine information from different paths in the network. The resulting KAFs can approximate any mapping defined over a subset of the real line, either convex or nonconvex. Furthermore, they are smooth over their entire domain, linear in their parameters, and they can be regularized using any known scheme, including the use of $\ell_1$ penalties to enforce sparseness. To the best of our knowledge, no other known model satisfies all these properties simultaneously. In addition, we provide a relatively complete overview on alternative techniques for adapting the activation functions, which is currently lacking in the literature. A large set of experiments validates our proposal.
Tasks
Published	2017-07-13
URL	http://arxiv.org/abs/1707.04035v2
PDF	http://arxiv.org/pdf/1707.04035v2.pdf
PWC	https://paperswithcode.com/paper/kafnets-kernel-based-non-parametric
Repo	https://github.com/ispamm/kernel-activation-functions
Framework	pytorch

Dynamic Evaluation of Neural Sequence Models


Title	Dynamic Evaluation of Neural Sequence Models
Authors	Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals
Abstract	We present methodology for using dynamic evaluation to improve neural sequence models. Models are adapted to recent history via a gradient descent based mechanism, causing them to assign higher probabilities to re-occurring sequential patterns. Dynamic evaluation outperforms existing adaptation approaches in our comparisons. Dynamic evaluation improves the state-of-the-art word-level perplexities on the Penn Treebank and WikiText-2 datasets to 51.1 and 44.3 respectively, and the state-of-the-art character-level cross-entropies on the text8 and Hutter Prize datasets to 1.19 bits/char and 1.08 bits/char respectively.
Tasks	Language Modelling
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07432v2
PDF	http://arxiv.org/pdf/1709.07432v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-evaluation-of-neural-sequence-models
Repo	https://github.com/benkrause/dynamiceval-transformer
Framework	tf

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video


Title	Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
Authors	Mohammad Javad Shafiee, Brendan Chywl, Francis Li, Alexander Wong
Abstract	Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system.
Tasks	Object Classification, Object Detection, Object Localization, Real-Time Object Detection
Published	2017-09-18
URL	http://arxiv.org/abs/1709.05943v1
PDF	http://arxiv.org/pdf/1709.05943v1.pdf
PWC	https://paperswithcode.com/paper/fast-yolo-a-fast-you-only-look-once-system
Repo	https://github.com/Vonski/wdi19
Framework	none