October 20, 2019

3448 words 17 mins read

Paper Group AWR 200

Block-Value Symmetries in Probabilistic Graphical Models. DVC: An End-to-end Deep Video Compression Framework. Switchable Temporal Propagation Network. MGANet: A Robust Model for Quality Enhancement of Compressed Video. DeepFall – Non-invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders. Morphosyntactic Tagging with a Meta- …

Block-Value Symmetries in Probabilistic Graphical Models


Title	Block-Value Symmetries in Probabilistic Graphical Models
Authors	Gagan Madan, Ankit Anand, Mausam, Parag Singla
Abstract	One popular way for lifted inference in probabilistic graphical models is to first merge symmetric states into a single cluster (orbit) and then use these for downstream inference, via variations of orbital MCMC [Niepert, 2012]. These orbits are represented compactly using permutations over variables, and variable-value (VV) pairs, but they can miss several state symmetries in a domain. We define the notion of permutations over block-value (BV) pairs, where a block is a set of variables. BV strictly generalizes VV symmetries, and can compute many more symmetries for increasing block sizes. To operationalize use of BV permutations in lifted inference, we describe 1) an algorithm to compute BV permutations given a block partition of the variables, 2) BV-MCMC, an extension of orbital MCMC that can sample from BV orbits, and 3) a heuristic to suggest good block partitions. Our experiments show that BV-MCMC can mix much faster compared to vanilla MCMC and orbital MCMC.
Tasks
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00643v2
PDF	http://arxiv.org/pdf/1807.00643v2.pdf
PWC	https://paperswithcode.com/paper/block-value-symmetries-in-probabilistic
Repo	https://github.com/dair-iitd/bv-mcmc
Framework	none

DVC: An End-to-end Deep Video Compression Framework


Title	DVC: An End-to-end Deep Video Compression Framework
Authors	Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao
Abstract	Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful non-linear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression. Specifically, learning based optical flow estimation is utilized to obtain the motion information and reconstruct the current frames. Then we employ two auto-encoder style neural networks to compress the corresponding motion and residual information. All the modules are jointly learned through a single loss function, in which they collaborate with each other by considering the trade-off between reducing the number of compression bits and improving quality of the decoded video. Experimental results show that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard H.265 in terms of MS-SSIM. Code is released at https://github.com/GuoLusjtu/DVC.
Tasks	Optical Flow Estimation, Video Compression
Published	2018-11-30
URL	http://arxiv.org/abs/1812.00101v3
PDF	http://arxiv.org/pdf/1812.00101v3.pdf
PWC	https://paperswithcode.com/paper/dvc-an-end-to-end-deep-video-compression
Repo	https://github.com/GuoLusjtu/DVC
Framework	none

Switchable Temporal Propagation Network


Title	Switchable Temporal Propagation Network
Authors	Sifei Liu, Guangyu Zhong, Shalini De Mello, Jinwei Gu, Varun Jampani, Ming-Hsuan Yang, Jan Kautz
Abstract	Videos contain highly redundant information between frames. Such redundancy has been extensively studied in video compression and encoding, but is less explored for more advanced video processing. In this paper, we propose a learnable unified framework for propagating a variety of visual properties of video images, including but not limited to color, high dynamic range (HDR), and segmentation information, where the properties are available for only a few key-frames. Our approach is based on a temporal propagation network (TPN), which models the transition-related affinity between a pair of frames in a purely data-driven manner. We theoretically prove two essential factors for TPN: (a) by regularizing the global transformation matrix as orthogonal, the “style energy” of the property can be well preserved during propagation; (b) such regularization can be achieved by the proposed switchable TPN with bi-directional training on pairs of frames. We apply the switchable TPN to three tasks: colorizing a gray-scale video based on a few color key-frames, generating an HDR video from a low dynamic range (LDR) video and a few HDR frames, and propagating a segmentation mask from the first frame in videos. Experimental results show that our approach is significantly more accurate and efficient than the state-of-the-art methods.
Tasks	Video Compression
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08758v2
PDF	http://arxiv.org/pdf/1804.08758v2.pdf
PWC	https://paperswithcode.com/paper/switchable-temporal-propagation-network
Repo	https://github.com/Liusifei/UVC
Framework	pytorch

MGANet: A Robust Model for Quality Enhancement of Compressed Video


Title	MGANet: A Robust Model for Quality Enhancement of Compressed Video
Authors	Xiandong Meng, Xuan Deng, Shuyuan Zhu, Shuaicheng Liu, Chuan Wang, Chen Chen, Bing Zeng
Abstract	In video compression, most of the existing deep learning approaches concentrate on the visual quality of a single frame, while ignoring the useful priors as well as the temporal information of adjacent frames. In this paper, we propose a multi-frame guided attention network (MGANet) to enhance the quality of compressed videos. Our network is composed of a temporal encoder that discovers inter-frame relations, a guided encoder-decoder subnet that encodes and enhances the visual patterns of target frame, and a multi-supervised reconstruction component that aggregates information to predict details. We design a bidirectional residual convolutional LSTM unit to implicitly discover frames variations over time with respect to the target frame. Meanwhile, the guided map is proposed to guide our network to concentrate more on the block boundary. Our approach takes advantage of intra-frame prior information and inter-frame information to improve the quality of compressed video. Experimental results show the robustness and superior performance of the proposed method.Code is available at https://github.com/mengab/MGANet
Tasks	Video Compression
Published	2018-11-22
URL	http://arxiv.org/abs/1811.09150v4
PDF	http://arxiv.org/pdf/1811.09150v4.pdf
PWC	https://paperswithcode.com/paper/mganet-a-robust-model-for-quality-enhancement
Repo	https://github.com/mengab/MGANet
Framework	pytorch

DeepFall – Non-invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders


Title	DeepFall – Non-invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders
Authors	Jacob Nogas, Shehroz S. Khan, Alex Mihailidis
Abstract	Human falls rarely occur; however, detecting falls is very important from the health and safety perspective. Due to the rarity of falls, it is difficult to employ supervised classification techniques to detect them. Moreover, in these highly skewed situations it is also difficult to extract domain specific features to identify falls. In this paper, we present a novel framework, \textit{DeepFall}, which formulates the fall detection problem as an anomaly detection problem. The \textit{DeepFall} framework presents the novel use of deep spatio-temporal convolutional autoencoders to learn spatial and temporal features from normal activities using non-invasive sensing modalities. We also present a new anomaly scoring method that combines the reconstruction score of frames across a video sequences to detect unseen falls. We tested the \textit{DeepFall} framework on three publicly available datasets collected through non-invasive sensing modalities, thermal camera and depth cameras and show superior results in comparison to traditional autoencoder and convolutional autoencoder methods to identify unseen falls.
Tasks	Anomaly Detection
Published	2018-08-30
URL	http://arxiv.org/abs/1809.00977v2
PDF	http://arxiv.org/pdf/1809.00977v2.pdf
PWC	https://paperswithcode.com/paper/deepfall-non-invasive-fall-detection-with
Repo	https://github.com/JJN123/Fall-Detection
Framework	tf

Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings


Title	Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings
Authors	Bernd Bohnet, Ryan McDonald, Goncalo Simoes, Daniel Andor, Emily Pitler, Joshua Maynez
Abstract	The rise of neural networks, and particularly recurrent neural networks, has produced significant advances in part-of-speech tagging accuracy. One characteristic common among these models is the presence of rich initial word encodings. These encodings typically are composed of a recurrent character-based representation with learned and pre-trained word embeddings. However, these encodings do not consider a context wider than a single word and it is only through subsequent recurrent layers that word or sub-word information interacts. In this paper, we investigate models that use recurrent neural networks with sentence-level context for initial character and word-based representations. In particular we show that optimal results are obtained by integrating these context sensitive representations through synchronized training with a meta-model that learns to combine their states. We present results on part-of-speech and morphological tagging with state-of-the-art performance on a number of languages.
Tasks	Morphological Tagging, Part-Of-Speech Tagging, Word Embeddings
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08237v1
PDF	http://arxiv.org/pdf/1805.08237v1.pdf
PWC	https://paperswithcode.com/paper/morphosyntactic-tagging-with-a-meta-bilstm
Repo	https://github.com/google/meta_tagger
Framework	tf

Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence


Title	Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence
Authors	Jogendra Nath Kundu, Rahul M. V., Aditya Ganeshan, R. Venkatesh Babu
Abstract	Understanding the geometry and pose of objects in 2D images is a fundamental necessity for a wide range of real world applications. Driven by deep neural networks, recent methods have brought significant improvements to object pose estimation. However, they suffer due to scarcity of keypoint/pose-annotated real images and hence can not exploit the object’s 3D structural information effectively. In this work, we propose a data-efficient method which utilizes the geometric regularity of intraclass objects for pose estimation. First, we learn pose-invariant local descriptors of object parts from simple 2D RGB images. These descriptors, along with keypoints obtained from renders of a fixed 3D template model are then used to generate keypoint correspondence maps for a given monocular real image. Finally, a pose estimation network predicts 3D pose of the object using these correspondence maps. This pipeline is further extended to a multi-view approach, which assimilates keypoint information from correspondence sets generated from multiple views of the 3D template model. Fusion of multi-view information significantly improves geometric comprehension of the system which in turn enhances the pose estimation performance. Furthermore, use of correspondence framework responsible for the learning of pose invariant keypoint descriptor also allows us to effectively alleviate the data-scarcity problem. This enables our method to achieve state-of-the-art performance on multiple real-image viewpoint estimation datasets, such as Pascal3D+ and ObjectNet3D. To encourage reproducible research, we have released the codes for our proposed approach.
Tasks	Pose Estimation, Viewpoint Estimation
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00553v1
PDF	http://arxiv.org/pdf/1809.00553v1.pdf
PWC	https://paperswithcode.com/paper/object-pose-estimation-from-monocular-image
Repo	https://github.com/val-iisc/iSPA-Net
Framework	none

Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping


Title	Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping
Authors	Dario Pavllo, Tiziano Piccardi, Robert West
Abstract	We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as [“Q”, said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm’s output for computational social science by analyzing the sentiment expressed in our extracted quotations.
Tasks
Published	2018-04-07
URL	http://arxiv.org/abs/1804.02525v1
PDF	http://arxiv.org/pdf/1804.02525v1.pdf
PWC	https://paperswithcode.com/paper/quootstrap-scalable-unsupervised-extraction
Repo	https://github.com/epfl-dlab/quootstrap
Framework	none

Towards Visual Feature Translation


Title	Towards Visual Feature Translation
Authors	Jie Hu, Rongrong Ji, Hong Liu, Shengchuan Zhang, Cheng Deng, Qi Tian
Abstract	Most existing visual search systems are deployed based upon fixed kinds of visual features, which prohibits the feature reusing across different systems or when upgrading systems with a new type of feature. Such a setting is obviously inflexible and time/memory consuming, which is indeed mendable if visual features can be “translated” across systems. In this paper, we make the first attempt towards visual feature translation to break through the barrier of using features across different visual search systems. To this end, we propose a Hybrid Auto-Encoder (HAE) to translate visual features, which learns a mapping by minimizing the translation and reconstruction errors. Based upon HAE, an Undirected Affinity Measurement (UAM) is further designed to quantify the affinity among different types of visual features. Extensive experiments have been conducted on several public datasets with sixteen different types of widely-used features in visual search systems. Quantitative results show the encouraging possibilities of feature translation. For the first time, the affinity among widely-used features like SIFT and DELF is reported.
Tasks
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00573v2
PDF	http://arxiv.org/pdf/1812.00573v2.pdf
PWC	https://paperswithcode.com/paper/towards-visual-feature-translation
Repo	https://github.com/hujiecpp/VisualFeatureTranslation
Framework	none

Action Recognition for Depth Video using Multi-view Dynamic Images


Title	Action Recognition for Depth Video using Multi-view Dynamic Images
Authors	Yang Xiao, Jun Chen, Yancheng Wang, Zhiguo Cao, Joey Tianyi Zhou, Xiang Bai
Abstract	Dynamic imaging is a recently proposed action description paradigm for simultaneously capturing motion and temporal evolution information, particularly in the context of deep convolutional neural networks (CNNs). Compared with optical flow for motion characterization, dynamic imaging exhibits superior efficiency and compactness. Inspired by the success of dynamic imaging in RGB video, this study extends it to the depth domain. To better exploit three-dimensional (3D) characteristics, multi-view dynamic images are proposed. In particular, the raw depth video is densely projected with respect to different virtual imaging viewpoints by rotating the virtual camera within the 3D space. Subsequently, dynamic images are extracted from the obtained multi-view depth videos and multi-view dynamic images are thus constructed from these images. Accordingly, more view-tolerant visual cues can be involved. A novel CNN model is then proposed to perform feature learning on multi-view dynamic images. Particularly, the dynamic images from different views share the same convolutional layers but correspond to different fully connected layers. This is aimed at enhancing the tuning effectiveness on shallow convolutional layers by alleviating the gradient vanishing problem. Moreover, as the spatial occurrence variation of the actions may impair the CNN, an action proposal approach is also put forth. In experiments, the proposed approach can achieve state-of-the-art performance on three challenging datasets.
Tasks	Optical Flow Estimation, Temporal Action Localization
Published	2018-06-29
URL	http://arxiv.org/abs/1806.11269v3
PDF	http://arxiv.org/pdf/1806.11269v3.pdf
PWC	https://paperswithcode.com/paper/action-recognition-for-depth-video-using
Repo	https://github.com/3huo/MVDI
Framework	none

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline


Title	General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
Authors	Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra
Abstract	This paper describes Task 2 of the DCASE 2018 Challenge, titled “General-purpose audio tagging of Freesound content with AudioSet labels”. This task was hosted on the Kaggle platform as “Freesound General-Purpose Audio Tagging Challenge”. The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology. We present the task, the dataset prepared for the competition, and a baseline system.
Tasks	Audio Tagging
Published	2018-07-26
URL	http://arxiv.org/abs/1807.09902v3
PDF	http://arxiv.org/pdf/1807.09902v3.pdf
PWC	https://paperswithcode.com/paper/general-purpose-tagging-of-freesound-audio
Repo	https://github.com/asudomoeva/Audio-Tagging
Framework	tf

Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry


Title	Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry
Authors	Benjamin Hou, Nina Miolane, Bishesh Khanal, Matthew C. H. Lee, Amir Alansary, Steven McDonagh, Jo V. Hajnal, Daniel Rueckert, Ben Glocker, Bernhard Kainz
Abstract	Pose estimation, i.e. predicting a 3D rigid transformation with respect to a fixed co-ordinate frame in, SE(3), is an omnipresent problem in medical image analysis with applications such as: image rigid registration, anatomical standard plane detection, tracking and device/camera pose estimation. Deep learning methods often parameterise a pose with a representation that separates rotation and translation. As commonly available frameworks do not provide means to calculate loss on a manifold, regression is usually performed using the L2-norm independently on the rotation’s and the translation’s parameterisations, which is a metric for linear spaces that does not take into account the Lie group structure of SE(3). In this paper, we propose a general Riemannian formulation of the pose estimation problem. We propose to train the CNN directly on SE(3) equipped with a left-invariant Riemannian metric, coupling the prediction of the translation and rotation defining the pose. At each training step, the ground truth and predicted pose are elements of the manifold, where the loss is calculated as the Riemannian geodesic distance. We then compute the optimisation direction by back-propagating the gradient with respect to the predicted pose on the tangent space of the manifold SE(3) and update the network weights. We thoroughly evaluate the effectiveness of our loss function by comparing its performance with popular and most commonly used existing methods, on tasks such as image-based localisation and intensity-based 2D/3D registration. We also show that hyper-parameters, used in our loss function to weight the contribution between rotations and translations, can be intrinsically calculated from the dataset to achieve greater performance margins.
Tasks	Pose Estimation
Published	2018-05-02
URL	http://arxiv.org/abs/1805.01026v3
PDF	http://arxiv.org/pdf/1805.01026v3.pdf
PWC	https://paperswithcode.com/paper/computing-cnn-loss-and-gradients-for-pose
Repo	https://github.com/farrell236/SVRnet
Framework	tf

A Flexible and Adaptive Framework for Abstention Under Class Imbalance


Title	A Flexible and Adaptive Framework for Abstention Under Class Imbalance
Authors	Avanti Shrikumar, Amr Alexandari, Anshul Kundaje
Abstract	In practical applications of machine learning, it is often desirable to identify and abstain on examples where the model’s predictions are likely to be incorrect. Much of the prior work on this topic focused on out-of-distribution detection or performance metrics such as top-k accuracy. Comparatively little attention was given to metrics such as area-under-the-curve or Cohen’s Kappa, which are extremely relevant for imbalanced datasets. Abstention strategies aimed at top-k accuracy can produce poor results on these metrics when applied to imbalanced datasets, even when all examples are in-distribution. We propose a framework to address this gap. Our framework leverages the insight that calibrated probability estimates can be used as a proxy for the true class labels, thereby allowing us to estimate the change in an arbitrary metric if an example were abstained on. Using this framework, we derive computationally efficient metric-specific abstention algorithms for optimizing the sensitivity at a target specificity level, the area under the ROC, and the weighted Cohen’s Kappa. Because our method relies only on calibrated probability estimates, we further show that by leveraging recent work on domain adaptation under label shift, we can generalize to test-set distributions that may have a different class imbalance compared to the training set distribution. On various experiments involving medical imaging, natural language processing, computer vision and genomics, we demonstrate the effectiveness of our approach. Source code available at https://github.com/blindauth/abstention. Colab notebooks reproducing results available at https://github.com/blindauth/abstention_experiments.
Tasks	Domain Adaptation, Out-of-Distribution Detection
Published	2018-02-20
URL	https://arxiv.org/abs/1802.07024v4
PDF	https://arxiv.org/pdf/1802.07024v4.pdf
PWC	https://paperswithcode.com/paper/selective-classification-via-curve
Repo	https://github.com/kundajelab/abstention
Framework	none

A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems


Title	A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems
Authors	Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Massimo Piccardi
Abstract	Automatic post-editing (APE) systems aim to correct the systematic errors made by machine translators. In this paper, we propose a neural APE system that encodes the source (src) and machine translated (mt) sentences with two separate encoders, but leverages a shared attention mechanism to better understand how the two inputs contribute to the generation of the post-edited (pe) sentences. Our empirical observations have showed that when the mt is incorrect, the attention shifts weight toward tokens in the src sentence to properly edit the incorrect translation. The model has been trained and evaluated on the official data from the WMT16 and WMT17 APE IT domain English-German shared tasks. Additionally, we have used the extra 500K artificial data provided by the shared task. Our system has been able to reproduce the accuracies of systems trained with the same data, while at the same time providing better interpretability.
Tasks	Automatic Post-Editing
Published	2018-07-01
URL	http://arxiv.org/abs/1807.00248v1
PDF	http://arxiv.org/pdf/1807.00248v1.pdf
PWC	https://paperswithcode.com/paper/a-shared-attention-mechanism-for
Repo	https://github.com/ijauregiCMCRC/Shared_Attention_for_APE
Framework	pytorch

Detecting Irregular Patterns in IoT Streaming Data for Fall Detection


Title	Detecting Irregular Patterns in IoT Streaming Data for Fall Detection
Authors	Sazia Mahfuz, Haruna Isah, Farhana Zulkernine, Peter Nicholls
Abstract	Detecting patterns in real time streaming data has been an interesting and challenging data analytics problem. With the proliferation of a variety of sensor devices, real-time analytics of data from the Internet of Things (IoT) to learn regular and irregular patterns has become an important machine learning problem to enable predictive analytics for automated notification and decision support. In this work, we address the problem of learning an irregular human activity pattern, fall, from streaming IoT data from wearable sensors. We present a deep neural network model for detecting fall based on accelerometer data giving 98.75 percent accuracy using an online physical activity monitoring dataset called “MobiAct”, which was published by Vavoulas et al. The initial model was developed using IBM Watson studio and then later transferred and deployed on IBM Cloud with the streaming analytics service supported by IBM Streams for monitoring real-time IoT data. We also present the systems architecture of the real-time fall detection framework that we intend to use with mbientlabs wearable health monitoring sensors for real time patient monitoring at retirement homes or rehabilitation clinics.
Tasks
Published	2018-11-16
URL	http://arxiv.org/abs/1811.06672v1
PDF	http://arxiv.org/pdf/1811.06672v1.pdf
PWC	https://paperswithcode.com/paper/detecting-irregular-patterns-in-iot-streaming
Repo	https://github.com/SaziaM/IEMCON2018
Framework	none