January 31, 2020

3378 words 16 mins read

Paper Group AWR 378

Paper Group AWR 378

Preference-based Interactive Multi-Document Summarisation. Convolutional neural network models for cancer type prediction based on gene expression. Deep Interpretable Non-Rigid Structure from Motion. A Multi-cascaded Deep Model for Bilingual SMS Classification. An Open-Source Framework for Adaptive Traffic Signal Control. Black is to Criminal as Ca …

Preference-based Interactive Multi-Document Summarisation

Title Preference-based Interactive Multi-Document Summarisation
Authors Yang Gao, Christian M. Meyer, Iryna Gurevych
Abstract Interactive NLP is a promising paradigm to close the gap between automatic NLP systems and the human upper bound. Preference-based interactive learning has been successfully applied, but the existing methods require several thousand interaction rounds even in simulations with perfect user feedback. In this paper, we study preference-based interactive summarisation. To reduce the number of interaction rounds, we propose the Active Preference-based ReInforcement Learning (APRIL) framework. APRIL uses Active Learning to query the user, Preference Learning to learn a summary ranking function from the preferences, and neural Reinforcement Learning to efficiently search for the (near-)optimal summary. Our results show that users can easily provide reliable preferences over summaries and that APRIL outperforms the state-of-the-art preference-based interactive method in both simulation and real-user experiments.
Tasks Active Learning
Published 2019-06-07
URL https://arxiv.org/abs/1906.02923v1
PDF https://arxiv.org/pdf/1906.02923v1.pdf
PWC https://paperswithcode.com/paper/preference-based-interactive-multi-document
Repo https://github.com/UKPLab/irj-neural-april
Framework none

Convolutional neural network models for cancer type prediction based on gene expression

Title Convolutional neural network models for cancer type prediction based on gene expression
Authors Milad Mostavi, Yu-Chiao Chiu, Yufei Huang, Yidong Chen
Abstract Background Precise prediction of cancer types is vital for cancer diagnosis and therapy. Important cancer marker genes can be inferred through predictive model. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of tissue of origin that can potentially bias the identification of cancer markers. Results In this paper, we introduced several Convolutional Neural Network (CNN) models that take unstructured gene expression inputs to classify tumor and non-tumor samples into their designated cancer types or as normal. Based on different designs of gene embeddings and convolution schemes, we implemented three CNN models: 1D-CNN, 2D-Vanilla-CNN, and 2D-Hybrid-CNN. The models were trained and tested on combined 10,340 samples of 33 cancer types and 731 matched normal tissues of The Cancer Genome Atlas (TCGA). Our models achieved excellent prediction accuracies (93.9-95.0%) among 34 classes (33 cancers and normal). Furthermore, we interpreted one of the models, known as 1D-CNN model, with a guided saliency technique and identified a total of 2,090 cancer markers (108 per class). The concordance of differential expression of these markers between the cancer type they represent and others is confirmed. In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1. Finally, we extended the 1D-CNN model for prediction of breast cancer subtypes and achieved an average accuracy of 88.42% among 5 subtypes. The codes can be found at https://github.com/chenlabgccri/CancerTypePrediction.
Tasks
Published 2019-06-18
URL https://arxiv.org/abs/1906.07794v1
PDF https://arxiv.org/pdf/1906.07794v1.pdf
PWC https://paperswithcode.com/paper/convolutional-neural-network-models-for
Repo https://github.com/chenlabgccri/CancerTypePrediction
Framework none

Deep Interpretable Non-Rigid Structure from Motion

Title Deep Interpretable Non-Rigid Structure from Motion
Authors Chen Kong, Simon Lucey
Abstract All current non-rigid structure from motion (NRSfM) algorithms are limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle. This has hampered the practical utility of NRSfM for many applications within vision. In this paper we propose a novel deep neural network to recover camera poses and 3D points solely from an ensemble of 2D image coordinates. The proposed neural network is mathematically interpretable as a multi-layer block sparse dictionary learning problem, and can handle problems of unprecedented scale and shape complexity. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works. The considerable model capacity of our approach affords remarkable generalization to unseen data. We propose a quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstruction. Once the network’s weights are estimated (for a non-rigid object) we show how our approach can effectively recover 3D shape from a single image – outperforming comparable methods that rely on direct 3D supervision.
Tasks Dictionary Learning
Published 2019-02-28
URL http://arxiv.org/abs/1902.10840v1
PDF http://arxiv.org/pdf/1902.10840v1.pdf
PWC https://paperswithcode.com/paper/deep-interpretable-non-rigid-structure-from
Repo https://github.com/kongchen1992/deep-nrsfm
Framework tf

A Multi-cascaded Deep Model for Bilingual SMS Classification

Title A Multi-cascaded Deep Model for Bilingual SMS Classification
Authors Muhammad Haroon Shakeel, Asim Karim, Imdadullah Khan
Abstract Most studies on text classification are focused on the English language. However, short texts such as SMS are influenced by regional languages. This makes the automatic text classification task challenging due to the multilingual, informal, and noisy nature of language in the text. In this work, we propose a novel multi-cascaded deep learning model called McM for bilingual SMS classification. McM exploits $n$-gram level information as well as long-term dependencies of text for learning. Our approach aims to learn a model without any code-switching indication, lexical normalization, language translation, or language transliteration. The model relies entirely upon the text as no external knowledge base is utilized for learning. For this purpose, a 12 class bilingual text dataset is developed from SMS feedbacks of citizens on public services containing mixed Roman Urdu and English languages. Our model achieves high accuracy for classification on this dataset and outperforms the previous model for multilingual text classification, highlighting language independence of McM.
Tasks Lexical Normalization, Multilingual text classification, Text Classification, Transliteration
Published 2019-11-29
URL https://arxiv.org/abs/1911.13066v1
PDF https://arxiv.org/pdf/1911.13066v1.pdf
PWC https://paperswithcode.com/paper/a-multi-cascaded-deep-model-for-bilingual-sms
Repo https://github.com/haroonshakeel/bilingual_sms_classification
Framework none

An Open-Source Framework for Adaptive Traffic Signal Control

Title An Open-Source Framework for Adaptive Traffic Signal Control
Authors Wade Genders, Saiedeh Razavi
Abstract Sub-optimal control policies in transportation systems negatively impact mobility, the environment and human health. Developing optimal transportation control systems at the appropriate scale can be difficult as cities’ transportation systems can be large, complex and stochastic. Intersection traffic signal controllers are an important element of modern transportation infrastructure where sub-optimal control policies can incur high costs to many users. Many adaptive traffic signal controllers have been proposed by the community but research is lacking regarding their relative performance difference - which adaptive traffic signal controller is best remains an open question. This research contributes a framework for developing and evaluating different adaptive traffic signal controller models in simulation - both learning and non-learning - and demonstrates its capabilities. The framework is used to first, investigate the performance variance of the modelled adaptive traffic signal controllers with respect to their hyperparameters and second, analyze the performance differences between controllers with optimal hyperparameters. The proposed framework contains implementations of some of the most popular adaptive traffic signal controllers from the literature; Webster’s, Max-pressure and Self-Organizing Traffic Lights, along with deep Q-network and deep deterministic policy gradient reinforcement learning controllers. This framework will aid researchers by accelerating their work from a common starting point, allowing them to generate results faster with less effort. All framework source code is available at https://github.com/docwza/sumolights.
Tasks
Published 2019-09-01
URL https://arxiv.org/abs/1909.00395v1
PDF https://arxiv.org/pdf/1909.00395v1.pdf
PWC https://paperswithcode.com/paper/an-open-source-framework-for-adaptive-traffic
Repo https://github.com/docwza/sumolights
Framework none

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings

Title Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings
Authors Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, Alan W Black
Abstract Online texts – across genres, registers, domains, and styles – are riddled with human stereotypes, expressed in overt or subtle ways. Word embeddings, trained on these texts, perpetuate and amplify these stereotypes, and propagate biases to machine learning models that use word embeddings as features. In this work, we propose a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender. Next, we propose a novel methodology for the evaluation of multiclass debiasing. We demonstrate that our multiclass debiasing is robust and maintains the efficacy in standard NLP tasks.
Tasks Word Embeddings
Published 2019-04-03
URL https://arxiv.org/abs/1904.04047v3
PDF https://arxiv.org/pdf/1904.04047v3.pdf
PWC https://paperswithcode.com/paper/black-is-to-criminal-as-caucasian-is-to
Repo https://github.com/TManzini/DebiasMulticlassWordEmbedding
Framework pytorch

Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification

Title Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification
Authors Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jinge Yao, Kai Han
Abstract Person re-identification is a challenging task due to various complex factors. Recent studies have attempted to integrate human parsing results or externally defined attributes to help capture human parts or important object regions. On the other hand, there still exist many useful contextual cues that do not fall into the scope of predefined human parts or attributes. In this paper, we address the missed contextual cues by exploiting both the accurate human parts and the coarse non-human parts. In our implementation, we apply a human parsing model to extract the binary human part masks \emph{and} a self-attention mechanism to capture the soft latent (non-human) part masks. We verify the effectiveness of our approach with new state-of-the-art performances on three challenging benchmarks: Market-1501, DukeMTMC-reID and CUHK03. Our implementation is available at https://github.com/ggjy/P2Net.pytorch.
Tasks Human Parsing, Person Re-Identification
Published 2019-10-22
URL https://arxiv.org/abs/1910.10111v1
PDF https://arxiv.org/pdf/1910.10111v1.pdf
PWC https://paperswithcode.com/paper/beyond-human-parts-dual-part-aligned
Repo https://github.com/ggjy/P2Net.pytorch
Framework pytorch

A Context-Aware Loss Function for Action Spotting in Soccer Videos

Title A Context-Aware Loss Function for Action Spotting in Soccer Videos
Authors Anthony Cioppa, Adrien Deliège, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck, Rikke Gade, Thomas B. Moeslund
Abstract In video understanding, action spotting consists in temporally localizing human-induced events annotated with single timestamps. In this paper, we propose a novel loss function that specifically considers the temporal context naturally present around each action, rather than focusing on the single annotated frame to spot. We benchmark our loss on a large dataset of soccer videos, SoccerNet, and achieve an improvement of 12.8% over the baseline. We show the generalization capability of our loss for generic activity proposals and detection on ActivityNet, by spotting the beginning and the end of each activity. Furthermore, we provide an extended ablation study and display challenging cases for action spotting in soccer videos. Finally, we qualitatively illustrate how our loss induces a precise temporal understanding of actions and show how such semantic knowledge can be used for automatic highlights generation.
Tasks Action Spotting, Video Understanding
Published 2019-12-03
URL https://arxiv.org/abs/1912.01326v3
PDF https://arxiv.org/pdf/1912.01326v3.pdf
PWC https://paperswithcode.com/paper/a-context-aware-loss-function-for-action
Repo https://github.com/cioppaanthony/context-aware-loss
Framework tf

Human Intracranial EEG Quantitative Analysis and Automatic Feature Learning for Epileptic Seizure Prediction

Title Human Intracranial EEG Quantitative Analysis and Automatic Feature Learning for Epileptic Seizure Prediction
Authors Ramy Hussein, Mohamed Osama Ahmed, Rabab Ward, Z. Jane Wang, Levin Kuhlmann, Yi Guo
Abstract Objective: The aim of this study is to develop an efficient and reliable epileptic seizure prediction system using intracranial EEG (iEEG) data, especially for people with drug-resistant epilepsy. The prediction procedure should yield accurate results in a fast enough fashion to alert patients of impending seizures. Methods: We quantitatively analyze the human iEEG data to obtain insights into how the human brain behaves before and between epileptic seizures. We then introduce an efficient pre-processing method for reducing the data size and converting the time-series iEEG data into an image-like format that can be used as inputs to convolutional neural networks (CNNs). Further, we propose a seizure prediction algorithm that uses cooperative multi-scale CNNs for automatic feature learning of iEEG data. Results: 1) iEEG channels contain complementary information and excluding individual channels is not advisable to retain the spatial information needed for accurate prediction of epileptic seizures. 2) The traditional PCA is not a reliable method for iEEG data reduction in seizure prediction. 3) Hand-crafted iEEG features may not be suitable for reliable seizure prediction performance as the iEEG data varies between patients and over time for the same patient. 4) Seizure prediction results show that our algorithm outperforms existing methods by achieving an average sensitivity of 87.85% and AUC score of 0.84. Conclusion: Understanding how the human brain behaves before seizure attacks and far from them facilitates better designs of epileptic seizure predictors. Significance: Accurate seizure prediction algorithms can warn patients about the next seizure attack so they could avoid dangerous activities. Medications could then be administered to abort the impending seizure and minimize the risk of injury.
Tasks EEG, Seizure prediction, Time Series
Published 2019-04-07
URL http://arxiv.org/abs/1904.03603v1
PDF http://arxiv.org/pdf/1904.03603v1.pdf
PWC https://paperswithcode.com/paper/human-intracranial-eeg-quantitative-analysis
Repo https://github.com/gabi-a/EEG-Literature
Framework none

FoveaBox: Beyond Anchor-based Object Detector

Title FoveaBox: Beyond Anchor-based Object Detector
Authors Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Jianbo Shi
Abstract We present FoveaBox, an accurate, flexible and completely anchor-free framework for object detection. While almost all state-of-the-art object detectors utilize the predefined anchors to enumerate possible locations, scales and aspect ratios for the search of the objects, their performance and generalization ability are also limited to the design of anchors. Instead, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object. The scales of target boxes are naturally associated with feature pyramid representations for each input image. Without bells and whistles, FoveaBox achieves state-of-the-art single model performance of 42.1 AP on the standard COCO detection benchmark. Specially for the objects with arbitrary aspect ratios, FoveaBox brings in significant improvement compared to the anchor-based detectors. More surprisingly, when it is challenged by the stretched testing images, FoveaBox shows great robustness and generalization ability to the changed distribution of bounding box shapes. The code will be made publicly available.
Tasks Object Detection
Published 2019-04-08
URL http://arxiv.org/abs/1904.03797v1
PDF http://arxiv.org/pdf/1904.03797v1.pdf
PWC https://paperswithcode.com/paper/foveabox-beyond-anchor-based-object-detector
Repo https://github.com/taokong/FoveaBox
Framework pytorch

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems

Title SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems
Authors Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen
Abstract Object detection and tracking are challenging tasks for resource-constrained embedded systems. While these tasks are among the most compute-intensive tasks from the artificial intelligence domain, they are only allowed to use limited computation and memory resources on embedded devices. In the meanwhile, such resource-constrained implementations are often required to satisfy additional demanding requirements such as real-time response, high-throughput performance, and reliable inference accuracy. To overcome these challenges, we propose SkyNet, a hardware-efficient neural network to deliver the state-of-the-art detection accuracy and speed for embedded systems. Instead of following the common top-down flow for compact DNN (Deep Neural Network) design, SkyNet provides a bottom-up DNN design approach with comprehensive understanding of the hardware constraints at the very beginning to deliver hardware-efficient DNNs. The effectiveness of SkyNet is demonstrated by winning the competitive System Design Contest for low power object detection in the 56th IEEE/ACM Design Automation Conference (DAC-SDC), where our SkyNet significantly outperforms all other 100+ competitors: it delivers 0.731 Intersection over Union (IoU) and 67.33 frames per second (FPS) on a TX2 embedded GPU; and 0.716 IoU and 25.05 FPS on an Ultra96 embedded FPGA. The evaluation of SkyNet is also extended to GOT-10K, a recent large-scale high-diversity benchmark for generic object tracking in the wild. For state-of-the-art object trackers SiamRPN++ and SiamMask, where ResNet-50 is employed as the backbone, implementations using our SkyNet as the backbone DNN are 1.60X and 1.73X faster with better or similar accuracy when running on a 1080Ti GPU, and 37.20X smaller in terms of parameter size for significantly better memory and storage footprint.
Tasks Object Detection, Object Tracking
Published 2019-09-20
URL https://arxiv.org/abs/1909.09709v2
PDF https://arxiv.org/pdf/1909.09709v2.pdf
PWC https://paperswithcode.com/paper/190909709
Repo https://github.com/TomG008/SkyNet
Framework pytorch

Unrolling Ternary Neural Networks

Title Unrolling Ternary Neural Networks
Authors Stephen Tridgell, Martin Kumm, Martin Hardieck, David Boland, Duncan Moss, Peter Zipf, Philip H. W. Leong
Abstract The computational complexity of neural networks for large scale or real-time applications necessitates hardware acceleration. Most approaches assume that the network architecture and parameters are unknown at design time, permitting usage in a large number of applications. This paper demonstrates, for the case where the neural network architecture and ternary weight values are known a priori, that extremely high throughput implementations of neural network inference can be made by customising the datapath and routing to remove unnecessary computations and data movement. This approach is ideally suited to FPGA implementations as a specialized implementation of a trained network improves efficiency while still retaining generality with the reconfigurability of an FPGA. A VGG style network with ternary weights and fixed point activations is implemented for the CIFAR10 dataset on Amazon’s AWS F1 instance. This paper demonstrates how to remove 90% of the operations in convolutional layers by exploiting sparsity and compile-time optimizations. The implementation in hardware achieves 90.9 +/- 0.1% accuracy and 122 k frames per second, with a latency of only 29 us, which is the fastest CNN inference implementation reported so far on an FPGA.
Tasks
Published 2019-09-09
URL https://arxiv.org/abs/1909.04509v1
PDF https://arxiv.org/pdf/1909.04509v1.pdf
PWC https://paperswithcode.com/paper/unrolling-ternary-neural-networks
Repo https://github.com/da-steve101/binary_connect_cifar
Framework none

4-Connected Shift Residual Networks

Title 4-Connected Shift Residual Networks
Authors Andrew Brown, Pascal Mettes, Marcel Worring
Abstract The shift operation was recently introduced as an alternative to spatial convolutions. The operation moves subsets of activations horizontally and/or vertically. Spatial convolutions are then replaced with shift operations followed by point-wise convolutions, significantly reducing computational costs. In this work, we investigate how shifts should best be applied to high accuracy CNNs. We apply shifts of two different neighbourhood groups to ResNet on ImageNet: the originally introduced 8-connected (8C) neighbourhood shift and the less well studied 4-connected (4C) neighbourhood shift. We find that when replacing ResNet’s spatial convolutions with shifts, both shift neighbourhoods give equal ImageNet accuracy, showing the sufficiency of small neighbourhoods for large images. Interestingly, when incorporating shifts to all point-wise convolutions in residual networks, 4-connected shifts outperform 8-connected shifts. Such a 4-connected shift setup gives the same accuracy as full residual networks while reducing the number of parameters and FLOPs by over 40%. We then highlight that without spatial convolutions, ResNet’s downsampling/upsampling bottleneck channel structure is no longer needed. We show a new, 4C shift-based residual network, much shorter than the original ResNet yet with a higher accuracy for the same computational cost. This network is the highest accuracy shift-based network yet shown, demonstrating the potential of shifting in deep neural networks.
Tasks
Published 2019-10-22
URL https://arxiv.org/abs/1910.09931v1
PDF https://arxiv.org/pdf/1910.09931v1.pdf
PWC https://paperswithcode.com/paper/4-connected-shift-residual-networks
Repo https://github.com/andrewgrahambrown/4CShiftResNet
Framework none

Deep Clustering with a Dynamic Autoencoder: From Reconstruction towards Centroids Construction

Title Deep Clustering with a Dynamic Autoencoder: From Reconstruction towards Centroids Construction
Authors Nairouz Mrabah, Naimul Mefraz Khan, Riadh Ksantini, Zied Lachiri
Abstract In unsupervised learning, there is no apparent straightforward cost function that can capture the significant factors of variations and similarities. Since natural systems have smooth dynamics, an opportunity is lost if an unsupervised objective function remains static during the training process. The absence of concrete supervision suggests that smooth dynamics should be integrated. Compared to classical static cost functions, dynamic objective functions allow to better make use of the gradual and uncertain knowledge acquired through pseudo-supervision. In this paper, we propose Dynamic Autoencoder (DynAE), a novel model for deep clustering that overcomes a clustering-reconstruction trade-off, by gradually and smoothly eliminating the reconstruction objective function in favor of a construction one. Experimental evaluations on benchmark datasets show that our approach achieves state-of-the-art results compared to the most relevant deep clustering methods.
Tasks Image Clustering
Published 2019-01-23
URL https://arxiv.org/abs/1901.07752v5
PDF https://arxiv.org/pdf/1901.07752v5.pdf
PWC https://paperswithcode.com/paper/deep-clustering-with-a-dynamic-autoencoder
Repo https://github.com/nairouz/DynAE
Framework tf

Learning Undirected Posteriors by Backpropagation through MCMC Updates

Title Learning Undirected Posteriors by Backpropagation through MCMC Updates
Authors Arash Vahdat, Evgeny Andriyash, William G. Macready
Abstract The representation of the posterior is a critical aspect of effective variational autoencoders (VAEs). Poor choices for the posterior have a detrimental impact on the generative performance of VAEs due to the mismatch with the true posterior. We extend the class of posterior models that may be learned by using undirected graphical models. We develop an efficient method to train undirected posteriors by showing that the gradient of the training objective with respect to the parameters of the undirected posterior can be computed by backpropagation through Markov chain Monte Carlo updates. We apply these gradient estimators for training discrete VAEs with Boltzmann machine posteriors and demonstrate that undirected models outperform previous results obtained using directed graphical models as posteriors.
Tasks Bayesian Inference
Published 2019-01-11
URL http://arxiv.org/abs/1901.03440v1
PDF http://arxiv.org/pdf/1901.03440v1.pdf
PWC https://paperswithcode.com/paper/learning-undirected-posteriors-by
Repo https://github.com/rickyHong/Quadrant-qupa-repl
Framework tf
comments powered by Disqus