Paper Group AWR 200
Block-Value Symmetries in Probabilistic Graphical Models. DVC: An End-to-end Deep Video Compression Framework. Switchable Temporal Propagation Network. MGANet: A Robust Model for Quality Enhancement of Compressed Video. DeepFall – Non-invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders. Morphosyntactic Tagging with a Meta- …
Block-Value Symmetries in Probabilistic Graphical Models
Title | Block-Value Symmetries in Probabilistic Graphical Models |
Authors | Gagan Madan, Ankit Anand, Mausam, Parag Singla |
Abstract | One popular way for lifted inference in probabilistic graphical models is to first merge symmetric states into a single cluster (orbit) and then use these for downstream inference, via variations of orbital MCMC [Niepert, 2012]. These orbits are represented compactly using permutations over variables, and variable-value (VV) pairs, but they can miss several state symmetries in a domain. We define the notion of permutations over block-value (BV) pairs, where a block is a set of variables. BV strictly generalizes VV symmetries, and can compute many more symmetries for increasing block sizes. To operationalize use of BV permutations in lifted inference, we describe 1) an algorithm to compute BV permutations given a block partition of the variables, 2) BV-MCMC, an extension of orbital MCMC that can sample from BV orbits, and 3) a heuristic to suggest good block partitions. Our experiments show that BV-MCMC can mix much faster compared to vanilla MCMC and orbital MCMC. |
Tasks | |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00643v2 |
http://arxiv.org/pdf/1807.00643v2.pdf | |
PWC | https://paperswithcode.com/paper/block-value-symmetries-in-probabilistic |
Repo | https://github.com/dair-iitd/bv-mcmc |
Framework | none |
DVC: An End-to-end Deep Video Compression Framework
Title | DVC: An End-to-end Deep Video Compression Framework |
Authors | Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao |
Abstract | Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful non-linear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression. Specifically, learning based optical flow estimation is utilized to obtain the motion information and reconstruct the current frames. Then we employ two auto-encoder style neural networks to compress the corresponding motion and residual information. All the modules are jointly learned through a single loss function, in which they collaborate with each other by considering the trade-off between reducing the number of compression bits and improving quality of the decoded video. Experimental results show that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard H.265 in terms of MS-SSIM. Code is released at https://github.com/GuoLusjtu/DVC. |
Tasks | Optical Flow Estimation, Video Compression |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1812.00101v3 |
http://arxiv.org/pdf/1812.00101v3.pdf | |
PWC | https://paperswithcode.com/paper/dvc-an-end-to-end-deep-video-compression |
Repo | https://github.com/GuoLusjtu/DVC |
Framework | none |
Switchable Temporal Propagation Network
Title | Switchable Temporal Propagation Network |
Authors | Sifei Liu, Guangyu Zhong, Shalini De Mello, Jinwei Gu, Varun Jampani, Ming-Hsuan Yang, Jan Kautz |
Abstract | Videos contain highly redundant information between frames. Such redundancy has been extensively studied in video compression and encoding, but is less explored for more advanced video processing. In this paper, we propose a learnable unified framework for propagating a variety of visual properties of video images, including but not limited to color, high dynamic range (HDR), and segmentation information, where the properties are available for only a few key-frames. Our approach is based on a temporal propagation network (TPN), which models the transition-related affinity between a pair of frames in a purely data-driven manner. We theoretically prove two essential factors for TPN: (a) by regularizing the global transformation matrix as orthogonal, the “style energy” of the property can be well preserved during propagation; (b) such regularization can be achieved by the proposed switchable TPN with bi-directional training on pairs of frames. We apply the switchable TPN to three tasks: colorizing a gray-scale video based on a few color key-frames, generating an HDR video from a low dynamic range (LDR) video and a few HDR frames, and propagating a segmentation mask from the first frame in videos. Experimental results show that our approach is significantly more accurate and efficient than the state-of-the-art methods. |
Tasks | Video Compression |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08758v2 |
http://arxiv.org/pdf/1804.08758v2.pdf | |
PWC | https://paperswithcode.com/paper/switchable-temporal-propagation-network |
Repo | https://github.com/Liusifei/UVC |
Framework | pytorch |
MGANet: A Robust Model for Quality Enhancement of Compressed Video
Title | MGANet: A Robust Model for Quality Enhancement of Compressed Video |
Authors | Xiandong Meng, Xuan Deng, Shuyuan Zhu, Shuaicheng Liu, Chuan Wang, Chen Chen, Bing Zeng |
Abstract | In video compression, most of the existing deep learning approaches concentrate on the visual quality of a single frame, while ignoring the useful priors as well as the temporal information of adjacent frames. In this paper, we propose a multi-frame guided attention network (MGANet) to enhance the quality of compressed videos. Our network is composed of a temporal encoder that discovers inter-frame relations, a guided encoder-decoder subnet that encodes and enhances the visual patterns of target frame, and a multi-supervised reconstruction component that aggregates information to predict details. We design a bidirectional residual convolutional LSTM unit to implicitly discover frames variations over time with respect to the target frame. Meanwhile, the guided map is proposed to guide our network to concentrate more on the block boundary. Our approach takes advantage of intra-frame prior information and inter-frame information to improve the quality of compressed video. Experimental results show the robustness and superior performance of the proposed method.Code is available at https://github.com/mengab/MGANet |
Tasks | Video Compression |
Published | 2018-11-22 |
URL | http://arxiv.org/abs/1811.09150v4 |
http://arxiv.org/pdf/1811.09150v4.pdf | |
PWC | https://paperswithcode.com/paper/mganet-a-robust-model-for-quality-enhancement |
Repo | https://github.com/mengab/MGANet |
Framework | pytorch |
DeepFall – Non-invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders
Title | DeepFall – Non-invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders |
Authors | Jacob Nogas, Shehroz S. Khan, Alex Mihailidis |
Abstract | Human falls rarely occur; however, detecting falls is very important from the health and safety perspective. Due to the rarity of falls, it is difficult to employ supervised classification techniques to detect them. Moreover, in these highly skewed situations it is also difficult to extract domain specific features to identify falls. In this paper, we present a novel framework, \textit{DeepFall}, which formulates the fall detection problem as an anomaly detection problem. The \textit{DeepFall} framework presents the novel use of deep spatio-temporal convolutional autoencoders to learn spatial and temporal features from normal activities using non-invasive sensing modalities. We also present a new anomaly scoring method that combines the reconstruction score of frames across a video sequences to detect unseen falls. We tested the \textit{DeepFall} framework on three publicly available datasets collected through non-invasive sensing modalities, thermal camera and depth cameras and show superior results in comparison to traditional autoencoder and convolutional autoencoder methods to identify unseen falls. |
Tasks | Anomaly Detection |
Published | 2018-08-30 |
URL | http://arxiv.org/abs/1809.00977v2 |
http://arxiv.org/pdf/1809.00977v2.pdf | |
PWC | https://paperswithcode.com/paper/deepfall-non-invasive-fall-detection-with |
Repo | https://github.com/JJN123/Fall-Detection |
Framework | tf |
Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings
Title | Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings |
Authors | Bernd Bohnet, Ryan McDonald, Goncalo Simoes, Daniel Andor, Emily Pitler, Joshua Maynez |
Abstract | The rise of neural networks, and particularly recurrent neural networks, has produced significant advances in part-of-speech tagging accuracy. One characteristic common among these models is the presence of rich initial word encodings. These encodings typically are composed of a recurrent character-based representation with learned and pre-trained word embeddings. However, these encodings do not consider a context wider than a single word and it is only through subsequent recurrent layers that word or sub-word information interacts. In this paper, we investigate models that use recurrent neural networks with sentence-level context for initial character and word-based representations. In particular we show that optimal results are obtained by integrating these context sensitive representations through synchronized training with a meta-model that learns to combine their states. We present results on part-of-speech and morphological tagging with state-of-the-art performance on a number of languages. |
Tasks | Morphological Tagging, Part-Of-Speech Tagging, Word Embeddings |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08237v1 |
http://arxiv.org/pdf/1805.08237v1.pdf | |
PWC | https://paperswithcode.com/paper/morphosyntactic-tagging-with-a-meta-bilstm |
Repo | https://github.com/google/meta_tagger |
Framework | tf |
Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence
Title | Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence |
Authors | Jogendra Nath Kundu, Rahul M. V., Aditya Ganeshan, R. Venkatesh Babu |
Abstract | Understanding the geometry and pose of objects in 2D images is a fundamental necessity for a wide range of real world applications. Driven by deep neural networks, recent methods have brought significant improvements to object pose estimation. However, they suffer due to scarcity of keypoint/pose-annotated real images and hence can not exploit the object’s 3D structural information effectively. In this work, we propose a data-efficient method which utilizes the geometric regularity of intraclass objects for pose estimation. First, we learn pose-invariant local descriptors of object parts from simple 2D RGB images. These descriptors, along with keypoints obtained from renders of a fixed 3D template model are then used to generate keypoint correspondence maps for a given monocular real image. Finally, a pose estimation network predicts 3D pose of the object using these correspondence maps. This pipeline is further extended to a multi-view approach, which assimilates keypoint information from correspondence sets generated from multiple views of the 3D template model. Fusion of multi-view information significantly improves geometric comprehension of the system which in turn enhances the pose estimation performance. Furthermore, use of correspondence framework responsible for the learning of pose invariant keypoint descriptor also allows us to effectively alleviate the data-scarcity problem. This enables our method to achieve state-of-the-art performance on multiple real-image viewpoint estimation datasets, such as Pascal3D+ and ObjectNet3D. To encourage reproducible research, we have released the codes for our proposed approach. |
Tasks | Pose Estimation, Viewpoint Estimation |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00553v1 |
http://arxiv.org/pdf/1809.00553v1.pdf | |
PWC | https://paperswithcode.com/paper/object-pose-estimation-from-monocular-image |
Repo | https://github.com/val-iisc/iSPA-Net |
Framework | none |
Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping
Title | Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping |
Authors | Dario Pavllo, Tiziano Piccardi, Robert West |
Abstract | We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as [“Q”, said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm’s output for computational social science by analyzing the sentiment expressed in our extracted quotations. |
Tasks | |
Published | 2018-04-07 |
URL | http://arxiv.org/abs/1804.02525v1 |
http://arxiv.org/pdf/1804.02525v1.pdf | |
PWC | https://paperswithcode.com/paper/quootstrap-scalable-unsupervised-extraction |
Repo | https://github.com/epfl-dlab/quootstrap |
Framework | none |
Towards Visual Feature Translation
Title | Towards Visual Feature Translation |
Authors | Jie Hu, Rongrong Ji, Hong Liu, Shengchuan Zhang, Cheng Deng, Qi Tian |
Abstract | Most existing visual search systems are deployed based upon fixed kinds of visual features, which prohibits the feature reusing across different systems or when upgrading systems with a new type of feature. Such a setting is obviously inflexible and time/memory consuming, which is indeed mendable if visual features can be “translated” across systems. In this paper, we make the first attempt towards visual feature translation to break through the barrier of using features across different visual search systems. To this end, we propose a Hybrid Auto-Encoder (HAE) to translate visual features, which learns a mapping by minimizing the translation and reconstruction errors. Based upon HAE, an Undirected Affinity Measurement (UAM) is further designed to quantify the affinity among different types of visual features. Extensive experiments have been conducted on several public datasets with sixteen different types of widely-used features in visual search systems. Quantitative results show the encouraging possibilities of feature translation. For the first time, the affinity among widely-used features like SIFT and DELF is reported. |
Tasks | |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00573v2 |
http://arxiv.org/pdf/1812.00573v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-visual-feature-translation |
Repo | https://github.com/hujiecpp/VisualFeatureTranslation |
Framework | none |
Action Recognition for Depth Video using Multi-view Dynamic Images
Title | Action Recognition for Depth Video using Multi-view Dynamic Images |
Authors | Yang Xiao, Jun Chen, Yancheng Wang, Zhiguo Cao, Joey Tianyi Zhou, Xiang Bai |
Abstract | Dynamic imaging is a recently proposed action description paradigm for simultaneously capturing motion and temporal evolution information, particularly in the context of deep convolutional neural networks (CNNs). Compared with optical flow for motion characterization, dynamic imaging exhibits superior efficiency and compactness. Inspired by the success of dynamic imaging in RGB video, this study extends it to the depth domain. To better exploit three-dimensional (3D) characteristics, multi-view dynamic images are proposed. In particular, the raw depth video is densely projected with respect to different virtual imaging viewpoints by rotating the virtual camera within the 3D space. Subsequently, dynamic images are extracted from the obtained multi-view depth videos and multi-view dynamic images are thus constructed from these images. Accordingly, more view-tolerant visual cues can be involved. A novel CNN model is then proposed to perform feature learning on multi-view dynamic images. Particularly, the dynamic images from different views share the same convolutional layers but correspond to different fully connected layers. This is aimed at enhancing the tuning effectiveness on shallow convolutional layers by alleviating the gradient vanishing problem. Moreover, as the spatial occurrence variation of the actions may impair the CNN, an action proposal approach is also put forth. In experiments, the proposed approach can achieve state-of-the-art performance on three challenging datasets. |
Tasks | Optical Flow Estimation, Temporal Action Localization |
Published | 2018-06-29 |
URL | http://arxiv.org/abs/1806.11269v3 |
http://arxiv.org/pdf/1806.11269v3.pdf | |
PWC | https://paperswithcode.com/paper/action-recognition-for-depth-video-using |
Repo | https://github.com/3huo/MVDI |
Framework | none |
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
Title | General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline |
Authors | Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra |
Abstract | This paper describes Task 2 of the DCASE 2018 Challenge, titled “General-purpose audio tagging of Freesound content with AudioSet labels”. This task was hosted on the Kaggle platform as “Freesound General-Purpose Audio Tagging Challenge”. The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology. We present the task, the dataset prepared for the competition, and a baseline system. |
Tasks | Audio Tagging |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.09902v3 |
http://arxiv.org/pdf/1807.09902v3.pdf | |
PWC | https://paperswithcode.com/paper/general-purpose-tagging-of-freesound-audio |
Repo | https://github.com/asudomoeva/Audio-Tagging |
Framework | tf |
Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry
Title | Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry |
Authors | Benjamin Hou, Nina Miolane, Bishesh Khanal, Matthew C. H. Lee, Amir Alansary, Steven McDonagh, Jo V. Hajnal, Daniel Rueckert, Ben Glocker, Bernhard Kainz |
Abstract | Pose estimation, i.e. predicting a 3D rigid transformation with respect to a fixed co-ordinate frame in, SE(3), is an omnipresent problem in medical image analysis with applications such as: image rigid registration, anatomical standard plane detection, tracking and device/camera pose estimation. Deep learning methods often parameterise a pose with a representation that separates rotation and translation. As commonly available frameworks do not provide means to calculate loss on a manifold, regression is usually performed using the L2-norm independently on the rotation’s and the translation’s parameterisations, which is a metric for linear spaces that does not take into account the Lie group structure of SE(3). In this paper, we propose a general Riemannian formulation of the pose estimation problem. We propose to train the CNN directly on SE(3) equipped with a left-invariant Riemannian metric, coupling the prediction of the translation and rotation defining the pose. At each training step, the ground truth and predicted pose are elements of the manifold, where the loss is calculated as the Riemannian geodesic distance. We then compute the optimisation direction by back-propagating the gradient with respect to the predicted pose on the tangent space of the manifold SE(3) and update the network weights. We thoroughly evaluate the effectiveness of our loss function by comparing its performance with popular and most commonly used existing methods, on tasks such as image-based localisation and intensity-based 2D/3D registration. We also show that hyper-parameters, used in our loss function to weight the contribution between rotations and translations, can be intrinsically calculated from the dataset to achieve greater performance margins. |
Tasks | Pose Estimation |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.01026v3 |
http://arxiv.org/pdf/1805.01026v3.pdf | |
PWC | https://paperswithcode.com/paper/computing-cnn-loss-and-gradients-for-pose |
Repo | https://github.com/farrell236/SVRnet |
Framework | tf |
A Flexible and Adaptive Framework for Abstention Under Class Imbalance
Title | A Flexible and Adaptive Framework for Abstention Under Class Imbalance |
Authors | Avanti Shrikumar, Amr Alexandari, Anshul Kundaje |
Abstract | In practical applications of machine learning, it is often desirable to identify and abstain on examples where the model’s predictions are likely to be incorrect. Much of the prior work on this topic focused on out-of-distribution detection or performance metrics such as top-k accuracy. Comparatively little attention was given to metrics such as area-under-the-curve or Cohen’s Kappa, which are extremely relevant for imbalanced datasets. Abstention strategies aimed at top-k accuracy can produce poor results on these metrics when applied to imbalanced datasets, even when all examples are in-distribution. We propose a framework to address this gap. Our framework leverages the insight that calibrated probability estimates can be used as a proxy for the true class labels, thereby allowing us to estimate the change in an arbitrary metric if an example were abstained on. Using this framework, we derive computationally efficient metric-specific abstention algorithms for optimizing the sensitivity at a target specificity level, the area under the ROC, and the weighted Cohen’s Kappa. Because our method relies only on calibrated probability estimates, we further show that by leveraging recent work on domain adaptation under label shift, we can generalize to test-set distributions that may have a different class imbalance compared to the training set distribution. On various experiments involving medical imaging, natural language processing, computer vision and genomics, we demonstrate the effectiveness of our approach. Source code available at https://github.com/blindauth/abstention. Colab notebooks reproducing results available at https://github.com/blindauth/abstention_experiments. |
Tasks | Domain Adaptation, Out-of-Distribution Detection |
Published | 2018-02-20 |
URL | https://arxiv.org/abs/1802.07024v4 |
https://arxiv.org/pdf/1802.07024v4.pdf | |
PWC | https://paperswithcode.com/paper/selective-classification-via-curve |
Repo | https://github.com/kundajelab/abstention |
Framework | none |
A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems
Title | A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems |
Authors | Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Massimo Piccardi |
Abstract | Automatic post-editing (APE) systems aim to correct the systematic errors made by machine translators. In this paper, we propose a neural APE system that encodes the source (src) and machine translated (mt) sentences with two separate encoders, but leverages a shared attention mechanism to better understand how the two inputs contribute to the generation of the post-edited (pe) sentences. Our empirical observations have showed that when the mt is incorrect, the attention shifts weight toward tokens in the src sentence to properly edit the incorrect translation. The model has been trained and evaluated on the official data from the WMT16 and WMT17 APE IT domain English-German shared tasks. Additionally, we have used the extra 500K artificial data provided by the shared task. Our system has been able to reproduce the accuracies of systems trained with the same data, while at the same time providing better interpretability. |
Tasks | Automatic Post-Editing |
Published | 2018-07-01 |
URL | http://arxiv.org/abs/1807.00248v1 |
http://arxiv.org/pdf/1807.00248v1.pdf | |
PWC | https://paperswithcode.com/paper/a-shared-attention-mechanism-for |
Repo | https://github.com/ijauregiCMCRC/Shared_Attention_for_APE |
Framework | pytorch |
Detecting Irregular Patterns in IoT Streaming Data for Fall Detection
Title | Detecting Irregular Patterns in IoT Streaming Data for Fall Detection |
Authors | Sazia Mahfuz, Haruna Isah, Farhana Zulkernine, Peter Nicholls |
Abstract | Detecting patterns in real time streaming data has been an interesting and challenging data analytics problem. With the proliferation of a variety of sensor devices, real-time analytics of data from the Internet of Things (IoT) to learn regular and irregular patterns has become an important machine learning problem to enable predictive analytics for automated notification and decision support. In this work, we address the problem of learning an irregular human activity pattern, fall, from streaming IoT data from wearable sensors. We present a deep neural network model for detecting fall based on accelerometer data giving 98.75 percent accuracy using an online physical activity monitoring dataset called “MobiAct”, which was published by Vavoulas et al. The initial model was developed using IBM Watson studio and then later transferred and deployed on IBM Cloud with the streaming analytics service supported by IBM Streams for monitoring real-time IoT data. We also present the systems architecture of the real-time fall detection framework that we intend to use with mbientlabs wearable health monitoring sensors for real time patient monitoring at retirement homes or rehabilitation clinics. |
Tasks | |
Published | 2018-11-16 |
URL | http://arxiv.org/abs/1811.06672v1 |
http://arxiv.org/pdf/1811.06672v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-irregular-patterns-in-iot-streaming |
Repo | https://github.com/SaziaM/IEMCON2018 |
Framework | none |