January 31, 2020

3378 words 16 mins read

Paper Group AWR 378

Preference-based Interactive Multi-Document Summarisation. Convolutional neural network models for cancer type prediction based on gene expression. Deep Interpretable Non-Rigid Structure from Motion. A Multi-cascaded Deep Model for Bilingual SMS Classification. An Open-Source Framework for Adaptive Traffic Signal Control. Black is to Criminal as Ca …

Preference-based Interactive Multi-Document Summarisation


Title	Preference-based Interactive Multi-Document Summarisation
Authors	Yang Gao, Christian M. Meyer, Iryna Gurevych
Abstract	Interactive NLP is a promising paradigm to close the gap between automatic NLP systems and the human upper bound. Preference-based interactive learning has been successfully applied, but the existing methods require several thousand interaction rounds even in simulations with perfect user feedback. In this paper, we study preference-based interactive summarisation. To reduce the number of interaction rounds, we propose the Active Preference-based ReInforcement Learning (APRIL) framework. APRIL uses Active Learning to query the user, Preference Learning to learn a summary ranking function from the preferences, and neural Reinforcement Learning to efficiently search for the (near-)optimal summary. Our results show that users can easily provide reliable preferences over summaries and that APRIL outperforms the state-of-the-art preference-based interactive method in both simulation and real-user experiments.
Tasks	Active Learning
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02923v1
PDF	https://arxiv.org/pdf/1906.02923v1.pdf
PWC	https://paperswithcode.com/paper/preference-based-interactive-multi-document
Repo	https://github.com/UKPLab/irj-neural-april
Framework	none

Convolutional neural network models for cancer type prediction based on gene expression


Title	Convolutional neural network models for cancer type prediction based on gene expression
Authors	Milad Mostavi, Yu-Chiao Chiu, Yufei Huang, Yidong Chen
Abstract	Background Precise prediction of cancer types is vital for cancer diagnosis and therapy. Important cancer marker genes can be inferred through predictive model. Several studies have attempted to build machine learning models for this task however none has taken into consideration the effects of tissue of origin that can potentially bias the identification of cancer markers. Results In this paper, we introduced several Convolutional Neural Network (CNN) models that take unstructured gene expression inputs to classify tumor and non-tumor samples into their designated cancer types or as normal. Based on different designs of gene embeddings and convolution schemes, we implemented three CNN models: 1D-CNN, 2D-Vanilla-CNN, and 2D-Hybrid-CNN. The models were trained and tested on combined 10,340 samples of 33 cancer types and 731 matched normal tissues of The Cancer Genome Atlas (TCGA). Our models achieved excellent prediction accuracies (93.9-95.0%) among 34 classes (33 cancers and normal). Furthermore, we interpreted one of the models, known as 1D-CNN model, with a guided saliency technique and identified a total of 2,090 cancer markers (108 per class). The concordance of differential expression of these markers between the cancer type they represent and others is confirmed. In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1. Finally, we extended the 1D-CNN model for prediction of breast cancer subtypes and achieved an average accuracy of 88.42% among 5 subtypes. The codes can be found at https://github.com/chenlabgccri/CancerTypePrediction.
Tasks
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07794v1
PDF	https://arxiv.org/pdf/1906.07794v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-network-models-for
Repo	https://github.com/chenlabgccri/CancerTypePrediction
Framework	none

Deep Interpretable Non-Rigid Structure from Motion


Title	Deep Interpretable Non-Rigid Structure from Motion
Authors	Chen Kong, Simon Lucey
Abstract	All current non-rigid structure from motion (NRSfM) algorithms are limited with respect to: (i) the number of images, and (ii) the type of shape variability they can handle. This has hampered the practical utility of NRSfM for many applications within vision. In this paper we propose a novel deep neural network to recover camera poses and 3D points solely from an ensemble of 2D image coordinates. The proposed neural network is mathematically interpretable as a multi-layer block sparse dictionary learning problem, and can handle problems of unprecedented scale and shape complexity. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works. The considerable model capacity of our approach affords remarkable generalization to unseen data. We propose a quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstruction. Once the network’s weights are estimated (for a non-rigid object) we show how our approach can effectively recover 3D shape from a single image – outperforming comparable methods that rely on direct 3D supervision.
Tasks	Dictionary Learning
Published	2019-02-28
URL	http://arxiv.org/abs/1902.10840v1
PDF	http://arxiv.org/pdf/1902.10840v1.pdf
PWC	https://paperswithcode.com/paper/deep-interpretable-non-rigid-structure-from
Repo	https://github.com/kongchen1992/deep-nrsfm
Framework	tf

A Multi-cascaded Deep Model for Bilingual SMS Classification


Title	A Multi-cascaded Deep Model for Bilingual SMS Classification
Authors	Muhammad Haroon Shakeel, Asim Karim, Imdadullah Khan
Abstract	Most studies on text classification are focused on the English language. However, short texts such as SMS are influenced by regional languages. This makes the automatic text classification task challenging due to the multilingual, informal, and noisy nature of language in the text. In this work, we propose a novel multi-cascaded deep learning model called McM for bilingual SMS classification. McM exploits $n$-gram level information as well as long-term dependencies of text for learning. Our approach aims to learn a model without any code-switching indication, lexical normalization, language translation, or language transliteration. The model relies entirely upon the text as no external knowledge base is utilized for learning. For this purpose, a 12 class bilingual text dataset is developed from SMS feedbacks of citizens on public services containing mixed Roman Urdu and English languages. Our model achieves high accuracy for classification on this dataset and outperforms the previous model for multilingual text classification, highlighting language independence of McM.
Tasks	Lexical Normalization, Multilingual text classification, Text Classification, Transliteration
Published	2019-11-29
URL	https://arxiv.org/abs/1911.13066v1
PDF	https://arxiv.org/pdf/1911.13066v1.pdf
PWC	https://paperswithcode.com/paper/a-multi-cascaded-deep-model-for-bilingual-sms
Repo	https://github.com/haroonshakeel/bilingual_sms_classification
Framework	none

An Open-Source Framework for Adaptive Traffic Signal Control


Title	An Open-Source Framework for Adaptive Traffic Signal Control
Authors	Wade Genders, Saiedeh Razavi
Abstract	Sub-optimal control policies in transportation systems negatively impact mobility, the environment and human health. Developing optimal transportation control systems at the appropriate scale can be difficult as cities’ transportation systems can be large, complex and stochastic. Intersection traffic signal controllers are an important element of modern transportation infrastructure where sub-optimal control policies can incur high costs to many users. Many adaptive traffic signal controllers have been proposed by the community but research is lacking regarding their relative performance difference - which adaptive traffic signal controller is best remains an open question. This research contributes a framework for developing and evaluating different adaptive traffic signal controller models in simulation - both learning and non-learning - and demonstrates its capabilities. The framework is used to first, investigate the performance variance of the modelled adaptive traffic signal controllers with respect to their hyperparameters and second, analyze the performance differences between controllers with optimal hyperparameters. The proposed framework contains implementations of some of the most popular adaptive traffic signal controllers from the literature; Webster’s, Max-pressure and Self-Organizing Traffic Lights, along with deep Q-network and deep deterministic policy gradient reinforcement learning controllers. This framework will aid researchers by accelerating their work from a common starting point, allowing them to generate results faster with less effort. All framework source code is available at https://github.com/docwza/sumolights.
Tasks
Published	2019-09-01
URL	https://arxiv.org/abs/1909.00395v1
PDF	https://arxiv.org/pdf/1909.00395v1.pdf
PWC	https://paperswithcode.com/paper/an-open-source-framework-for-adaptive-traffic
Repo	https://github.com/docwza/sumolights
Framework	none

Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings


Title	Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings
Authors	Thomas Manzini, Yao Chong Lim, Yulia Tsvetkov, Alan W Black
Abstract	Online texts – across genres, registers, domains, and styles – are riddled with human stereotypes, expressed in overt or subtle ways. Word embeddings, trained on these texts, perpetuate and amplify these stereotypes, and propagate biases to machine learning models that use word embeddings as features. In this work, we propose a method to debias word embeddings in multiclass settings such as race and religion, extending the work of (Bolukbasi et al., 2016) from the binary setting, such as binary gender. Next, we propose a novel methodology for the evaluation of multiclass debiasing. We demonstrate that our multiclass debiasing is robust and maintains the efficacy in standard NLP tasks.
Tasks	Word Embeddings
Published	2019-04-03
URL	https://arxiv.org/abs/1904.04047v3
PDF	https://arxiv.org/pdf/1904.04047v3.pdf
PWC	https://paperswithcode.com/paper/black-is-to-criminal-as-caucasian-is-to
Repo	https://github.com/TManzini/DebiasMulticlassWordEmbedding
Framework	pytorch

Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification


Title	Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification
Authors	Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jinge Yao, Kai Han
Abstract	Person re-identification is a challenging task due to various complex factors. Recent studies have attempted to integrate human parsing results or externally defined attributes to help capture human parts or important object regions. On the other hand, there still exist many useful contextual cues that do not fall into the scope of predefined human parts or attributes. In this paper, we address the missed contextual cues by exploiting both the accurate human parts and the coarse non-human parts. In our implementation, we apply a human parsing model to extract the binary human part masks \emph{and} a self-attention mechanism to capture the soft latent (non-human) part masks. We verify the effectiveness of our approach with new state-of-the-art performances on three challenging benchmarks: Market-1501, DukeMTMC-reID and CUHK03. Our implementation is available at https://github.com/ggjy/P2Net.pytorch.
Tasks	Human Parsing, Person Re-Identification
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10111v1
PDF	https://arxiv.org/pdf/1910.10111v1.pdf
PWC	https://paperswithcode.com/paper/beyond-human-parts-dual-part-aligned
Repo	https://github.com/ggjy/P2Net.pytorch
Framework	pytorch

A Context-Aware Loss Function for Action Spotting in Soccer Videos


Title	A Context-Aware Loss Function for Action Spotting in Soccer Videos
Authors	Anthony Cioppa, Adrien Deliège, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck, Rikke Gade, Thomas B. Moeslund
Abstract	In video understanding, action spotting consists in temporally localizing human-induced events annotated with single timestamps. In this paper, we propose a novel loss function that specifically considers the temporal context naturally present around each action, rather than focusing on the single annotated frame to spot. We benchmark our loss on a large dataset of soccer videos, SoccerNet, and achieve an improvement of 12.8% over the baseline. We show the generalization capability of our loss for generic activity proposals and detection on ActivityNet, by spotting the beginning and the end of each activity. Furthermore, we provide an extended ablation study and display challenging cases for action spotting in soccer videos. Finally, we qualitatively illustrate how our loss induces a precise temporal understanding of actions and show how such semantic knowledge can be used for automatic highlights generation.
Tasks	Action Spotting, Video Understanding
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01326v3
PDF	https://arxiv.org/pdf/1912.01326v3.pdf
PWC	https://paperswithcode.com/paper/a-context-aware-loss-function-for-action
Repo	https://github.com/cioppaanthony/context-aware-loss
Framework	tf

Human Intracranial EEG Quantitative Analysis and Automatic Feature Learning for Epileptic Seizure Prediction


Title	Human Intracranial EEG Quantitative Analysis and Automatic Feature Learning for Epileptic Seizure Prediction
Authors	Ramy Hussein, Mohamed Osama Ahmed, Rabab Ward, Z. Jane Wang, Levin Kuhlmann, Yi Guo
Abstract	Objective: The aim of this study is to develop an efficient and reliable epileptic seizure prediction system using intracranial EEG (iEEG) data, especially for people with drug-resistant epilepsy. The prediction procedure should yield accurate results in a fast enough fashion to alert patients of impending seizures. Methods: We quantitatively analyze the human iEEG data to obtain insights into how the human brain behaves before and between epileptic seizures. We then introduce an efficient pre-processing method for reducing the data size and converting the time-series iEEG data into an image-like format that can be used as inputs to convolutional neural networks (CNNs). Further, we propose a seizure prediction algorithm that uses cooperative multi-scale CNNs for automatic feature learning of iEEG data. Results: 1) iEEG channels contain complementary information and excluding individual channels is not advisable to retain the spatial information needed for accurate prediction of epileptic seizures. 2) The traditional PCA is not a reliable method for iEEG data reduction in seizure prediction. 3) Hand-crafted iEEG features may not be suitable for reliable seizure prediction performance as the iEEG data varies between patients and over time for the same patient. 4) Seizure prediction results show that our algorithm outperforms existing methods by achieving an average sensitivity of 87.85% and AUC score of 0.84. Conclusion: Understanding how the human brain behaves before seizure attacks and far from them facilitates better designs of epileptic seizure predictors. Significance: Accurate seizure prediction algorithms can warn patients about the next seizure attack so they could avoid dangerous activities. Medications could then be administered to abort the impending seizure and minimize the risk of injury.
Tasks	EEG, Seizure prediction, Time Series
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03603v1
PDF	http://arxiv.org/pdf/1904.03603v1.pdf
PWC	https://paperswithcode.com/paper/human-intracranial-eeg-quantitative-analysis
Repo	https://github.com/gabi-a/EEG-Literature
Framework	none

FoveaBox: Beyond Anchor-based Object Detector


Title	FoveaBox: Beyond Anchor-based Object Detector
Authors	Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Jianbo Shi
Abstract	We present FoveaBox, an accurate, flexible and completely anchor-free framework for object detection. While almost all state-of-the-art object detectors utilize the predefined anchors to enumerate possible locations, scales and aspect ratios for the search of the objects, their performance and generalization ability are also limited to the design of anchors. Instead, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object. The scales of target boxes are naturally associated with feature pyramid representations for each input image. Without bells and whistles, FoveaBox achieves state-of-the-art single model performance of 42.1 AP on the standard COCO detection benchmark. Specially for the objects with arbitrary aspect ratios, FoveaBox brings in significant improvement compared to the anchor-based detectors. More surprisingly, when it is challenged by the stretched testing images, FoveaBox shows great robustness and generalization ability to the changed distribution of bounding box shapes. The code will be made publicly available.
Tasks	Object Detection
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03797v1
PDF	http://arxiv.org/pdf/1904.03797v1.pdf
PWC	https://paperswithcode.com/paper/foveabox-beyond-anchor-based-object-detector
Repo	https://github.com/taokong/FoveaBox
Framework	pytorch

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems


Title	SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems
Authors	Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen
Abstract	Object detection and tracking are challenging tasks for resource-constrained embedded systems. While these tasks are among the most compute-intensive tasks from the artificial intelligence domain, they are only allowed to use limited computation and memory resources on embedded devices. In the meanwhile, such resource-constrained implementations are often required to satisfy additional demanding requirements such as real-time response, high-throughput performance, and reliable inference accuracy. To overcome these challenges, we propose SkyNet, a hardware-efficient neural network to deliver the state-of-the-art detection accuracy and speed for embedded systems. Instead of following the common top-down flow for compact DNN (Deep Neural Network) design, SkyNet provides a bottom-up DNN design approach with comprehensive understanding of the hardware constraints at the very beginning to deliver hardware-efficient DNNs. The effectiveness of SkyNet is demonstrated by winning the competitive System Design Contest for low power object detection in the 56th IEEE/ACM Design Automation Conference (DAC-SDC), where our SkyNet significantly outperforms all other 100+ competitors: it delivers 0.731 Intersection over Union (IoU) and 67.33 frames per second (FPS) on a TX2 embedded GPU; and 0.716 IoU and 25.05 FPS on an Ultra96 embedded FPGA. The evaluation of SkyNet is also extended to GOT-10K, a recent large-scale high-diversity benchmark for generic object tracking in the wild. For state-of-the-art object trackers SiamRPN++ and SiamMask, where ResNet-50 is employed as the backbone, implementations using our SkyNet as the backbone DNN are 1.60X and 1.73X faster with better or similar accuracy when running on a 1080Ti GPU, and 37.20X smaller in terms of parameter size for significantly better memory and storage footprint.
Tasks	Object Detection, Object Tracking
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09709v2
PDF	https://arxiv.org/pdf/1909.09709v2.pdf
PWC	https://paperswithcode.com/paper/190909709
Repo	https://github.com/TomG008/SkyNet
Framework	pytorch

Unrolling Ternary Neural Networks


Title	Unrolling Ternary Neural Networks
Authors	Stephen Tridgell, Martin Kumm, Martin Hardieck, David Boland, Duncan Moss, Peter Zipf, Philip H. W. Leong
Abstract	The computational complexity of neural networks for large scale or real-time applications necessitates hardware acceleration. Most approaches assume that the network architecture and parameters are unknown at design time, permitting usage in a large number of applications. This paper demonstrates, for the case where the neural network architecture and ternary weight values are known a priori, that extremely high throughput implementations of neural network inference can be made by customising the datapath and routing to remove unnecessary computations and data movement. This approach is ideally suited to FPGA implementations as a specialized implementation of a trained network improves efficiency while still retaining generality with the reconfigurability of an FPGA. A VGG style network with ternary weights and fixed point activations is implemented for the CIFAR10 dataset on Amazon’s AWS F1 instance. This paper demonstrates how to remove 90% of the operations in convolutional layers by exploiting sparsity and compile-time optimizations. The implementation in hardware achieves 90.9 +/- 0.1% accuracy and 122 k frames per second, with a latency of only 29 us, which is the fastest CNN inference implementation reported so far on an FPGA.
Tasks
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04509v1
PDF	https://arxiv.org/pdf/1909.04509v1.pdf
PWC	https://paperswithcode.com/paper/unrolling-ternary-neural-networks
Repo	https://github.com/da-steve101/binary_connect_cifar
Framework	none

4-Connected Shift Residual Networks


Title	4-Connected Shift Residual Networks
Authors	Andrew Brown, Pascal Mettes, Marcel Worring
Abstract	The shift operation was recently introduced as an alternative to spatial convolutions. The operation moves subsets of activations horizontally and/or vertically. Spatial convolutions are then replaced with shift operations followed by point-wise convolutions, significantly reducing computational costs. In this work, we investigate how shifts should best be applied to high accuracy CNNs. We apply shifts of two different neighbourhood groups to ResNet on ImageNet: the originally introduced 8-connected (8C) neighbourhood shift and the less well studied 4-connected (4C) neighbourhood shift. We find that when replacing ResNet’s spatial convolutions with shifts, both shift neighbourhoods give equal ImageNet accuracy, showing the sufficiency of small neighbourhoods for large images. Interestingly, when incorporating shifts to all point-wise convolutions in residual networks, 4-connected shifts outperform 8-connected shifts. Such a 4-connected shift setup gives the same accuracy as full residual networks while reducing the number of parameters and FLOPs by over 40%. We then highlight that without spatial convolutions, ResNet’s downsampling/upsampling bottleneck channel structure is no longer needed. We show a new, 4C shift-based residual network, much shorter than the original ResNet yet with a higher accuracy for the same computational cost. This network is the highest accuracy shift-based network yet shown, demonstrating the potential of shifting in deep neural networks.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09931v1
PDF	https://arxiv.org/pdf/1910.09931v1.pdf
PWC	https://paperswithcode.com/paper/4-connected-shift-residual-networks
Repo	https://github.com/andrewgrahambrown/4CShiftResNet
Framework	none

Deep Clustering with a Dynamic Autoencoder: From Reconstruction towards Centroids Construction


Title	Deep Clustering with a Dynamic Autoencoder: From Reconstruction towards Centroids Construction
Authors	Nairouz Mrabah, Naimul Mefraz Khan, Riadh Ksantini, Zied Lachiri
Abstract	In unsupervised learning, there is no apparent straightforward cost function that can capture the significant factors of variations and similarities. Since natural systems have smooth dynamics, an opportunity is lost if an unsupervised objective function remains static during the training process. The absence of concrete supervision suggests that smooth dynamics should be integrated. Compared to classical static cost functions, dynamic objective functions allow to better make use of the gradual and uncertain knowledge acquired through pseudo-supervision. In this paper, we propose Dynamic Autoencoder (DynAE), a novel model for deep clustering that overcomes a clustering-reconstruction trade-off, by gradually and smoothly eliminating the reconstruction objective function in favor of a construction one. Experimental evaluations on benchmark datasets show that our approach achieves state-of-the-art results compared to the most relevant deep clustering methods.
Tasks	Image Clustering
Published	2019-01-23
URL	https://arxiv.org/abs/1901.07752v5
PDF	https://arxiv.org/pdf/1901.07752v5.pdf
PWC	https://paperswithcode.com/paper/deep-clustering-with-a-dynamic-autoencoder
Repo	https://github.com/nairouz/DynAE
Framework	tf

Learning Undirected Posteriors by Backpropagation through MCMC Updates


Title	Learning Undirected Posteriors by Backpropagation through MCMC Updates
Authors	Arash Vahdat, Evgeny Andriyash, William G. Macready
Abstract	The representation of the posterior is a critical aspect of effective variational autoencoders (VAEs). Poor choices for the posterior have a detrimental impact on the generative performance of VAEs due to the mismatch with the true posterior. We extend the class of posterior models that may be learned by using undirected graphical models. We develop an efficient method to train undirected posteriors by showing that the gradient of the training objective with respect to the parameters of the undirected posterior can be computed by backpropagation through Markov chain Monte Carlo updates. We apply these gradient estimators for training discrete VAEs with Boltzmann machine posteriors and demonstrate that undirected models outperform previous results obtained using directed graphical models as posteriors.
Tasks	Bayesian Inference
Published	2019-01-11
URL	http://arxiv.org/abs/1901.03440v1
PDF	http://arxiv.org/pdf/1901.03440v1.pdf
PWC	https://paperswithcode.com/paper/learning-undirected-posteriors-by
Repo	https://github.com/rickyHong/Quadrant-qupa-repl
Framework	tf