February 1, 2020

3112 words 15 mins read

Paper Group AWR 134

Domain Knowledge Based Brain Tumor Segmentation and Overall Survival Prediction. Learning Semantic Annotations for Tabular Data. Training convolutional neural networks with cheap convolutions and online distillation. Working women and caste in India: A study of social disadvantage using feature attribution. Bayesian Loss for Crowd Count Estimation …

Domain Knowledge Based Brain Tumor Segmentation and Overall Survival Prediction


Title	Domain Knowledge Based Brain Tumor Segmentation and Overall Survival Prediction
Authors	Xiaoqing Guo, Chen Yang, Pak Lun Lam, Peter Y. M. Woo, Yixuan Yuan
Abstract	Automatically segmenting sub-regions of gliomas (necrosis, edema and enhancing tumor) and accurately predicting overall survival (OS) time from multimodal MRI sequences have important clinical significance in diagnosis, prognosis and treatment of gliomas. However, due to the high degree variations of heterogeneous appearance and individual physical state, the segmentation of sub-regions and OS prediction are very challenging. To deal with these challenges, we utilize a 3D dilated multi-fiber network (DMFNet) with weighted dice loss for brain tumor segmentation, which incorporates prior volume statistic knowledge and obtains a balance between small and large objects in MRI scans. For OS prediction, we propose a DenseNet based 3D neural network with position encoding convolutional layer (PECL) to extract meaningful features from T1 contrast MRI, T2 MRI and previously segmented subregions. Both labeled data and unlabeled data are utilized to prevent over-fitting for semi-supervised learning. Those learned deep features along with handcrafted features (such as ages, volume of tumor) and position encoding segmentation features are fed to a Gradient Boosting Decision Tree (GBDT) to predict a specific OS day
Tasks	Brain Tumor Segmentation
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07224v1
PDF	https://arxiv.org/pdf/1912.07224v1.pdf
PWC	https://paperswithcode.com/paper/domain-knowledge-based-brain-tumor
Repo	https://github.com/Guo-Xiaoqing/BraTS_OS
Framework	none

Learning Semantic Annotations for Tabular Data


Title	Learning Semantic Annotations for Tabular Data
Authors	Jiaoyan Chen, Ernesto Jimenez-Ruiz, Ian Horrocks, Charles Sutton
Abstract	The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we propose a deep prediction model that can fully exploit a table’s contextual semantics, including table locality features learned by a Hybrid Neural Network (HNN), and inter-column semantics features learned by a knowledge base (KB) lookup and query answering algorithm.It exhibits good performance not only on individual table sets, but also when transferring from one table set to another.
Tasks
Published	2019-05-30
URL	https://arxiv.org/abs/1906.00781v1
PDF	https://arxiv.org/pdf/1906.00781v1.pdf
PWC	https://paperswithcode.com/paper/190600781
Repo	https://github.com/alan-turing-institute/SemAIDA
Framework	none

Training convolutional neural networks with cheap convolutions and online distillation


Title	Training convolutional neural networks with cheap convolutions and online distillation
Authors	Jiao Xie, Shaohui Lin, Yichen Zhang, Linkai Luo
Abstract	The large memory and computation consumption in convolutional neural networks (CNNs) has been one of the main barriers for deploying them on resource-limited systems. To this end, most cheap convolutions (e.g., group convolution, depth-wise convolution, and shift convolution) have recently been used for memory and computation reduction but with the specific architecture designing. Furthermore, it results in a low discriminability of the compressed networks by directly replacing the standard convolution with these cheap ones. In this paper, we propose to use knowledge distillation to improve the performance of the compact student networks with cheap convolutions. In our case, the teacher is a network with the standard convolution, while the student is a simple transformation of the teacher architecture without complicated redesigning. In particular, we propose a novel online distillation method, which online constructs the teacher network without pre-training and conducts mutual learning between the teacher and student network, to improve the performance of the student model. Extensive experiments demonstrate that the proposed approach achieves superior performance to simultaneously reduce memory and computation overhead of cutting-edge CNNs on different datasets, including CIFAR-10/100 and ImageNet ILSVRC 2012, compared to the state-of-the-art CNN compression and acceleration methods. The codes are publicly available at https://github.com/EthanZhangYC/OD-cheap-convolution.
Tasks
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13063v3
PDF	https://arxiv.org/pdf/1909.13063v3.pdf
PWC	https://paperswithcode.com/paper/training-convolutional-neural-networks-with-2
Repo	https://github.com/EthanZhangYC/OD-cheap-convolution
Framework	pytorch


Title	Working women and caste in India: A study of social disadvantage using feature attribution
Authors	Kuhu Joshi, Chaitanya K. Joshi
Abstract	Women belonging to the socially disadvantaged caste-groups in India have historically been engaged in labour-intensive, blue-collar work. We study whether there has been any change in the ability to predict a woman’s work-status and work-type based on her caste by interpreting machine learning models using feature attribution. We find that caste is now a less important determinant of work for the younger generation of women compared to the older generation. Moreover, younger women from disadvantaged castes are now more likely to be working in white-collar jobs.
Tasks
Published	2019-04-27
URL	https://arxiv.org/abs/1905.03092v2
PDF	https://arxiv.org/pdf/1905.03092v2.pdf
PWC	https://paperswithcode.com/paper/190503092
Repo	https://github.com/chaitjo/working-women
Framework	none

Bayesian Loss for Crowd Count Estimation with Point Supervision


Title	Bayesian Loss for Crowd Count Estimation with Point Supervision
Authors	Zhiheng Ma, Xing Wei, Xiaopeng Hong, Yihong Gong
Abstract	In crowd counting datasets, each person is annotated by a point, which is usually the center of the head. And the task is to estimate the total count in a crowd scene. Most of the state-of-the-art methods are based on density map estimation, which convert the sparse point annotations into a “ground truth” density map through a Gaussian kernel, and then use it as the learning target to train a density map estimator. However, such a “ground-truth” density map is imperfect due to occlusions, perspective effects, variations in object shapes, etc. On the contrary, we propose \emph{Bayesian loss}, a novel loss function which constructs a density contribution probability model from the point annotations. Instead of constraining the value at every pixel in the density map, the proposed training loss adopts a more reliable supervision on the count expectation at each annotated point. Without bells and whistles, the loss function makes substantial improvements over the baseline loss on all tested datasets. Moreover, our proposed loss function equipped with a standard backbone network, without using any external detectors or multi-scale architectures, plays favourably against the state of the arts. Our method outperforms previous best approaches by a large margin on the latest and largest UCF-QNRF dataset. The source code is available at \url{https://github.com/ZhihengCV/Baysian-Crowd-Counting}.
Tasks	Crowd Counting
Published	2019-08-10
URL	https://arxiv.org/abs/1908.03684v1
PDF	https://arxiv.org/pdf/1908.03684v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-loss-for-crowd-count-estimation-with
Repo	https://github.com/ZhihengCV/Bayesian-Crowd-Counting
Framework	pytorch

PCC Net: Perspective Crowd Counting via Spatial Convolutional Network


Title	PCC Net: Perspective Crowd Counting via Spatial Convolutional Network
Authors	Junyu Gao, Qi Wang, Xuelong Li
Abstract	Crowd counting from a single image is a challenging task due to high appearance similarity, perspective changes and severe congestion. Many methods only focus on the local appearance features and they cannot handle the aforementioned challenges. In order to tackle them, we propose a Perspective Crowd Counting Network (PCC Net), which consists of three parts: 1) Density Map Estimation (DME) focuses on learning very local features for density map estimation; 2) Random High-level Density Classification (R-HDC) extracts global features to predict the coarse density labels of random patches in images; 3) Fore-/Background Segmentation (FBS) encodes mid-level features to segments the foreground and background. Besides, the DULR module is embedded in PCC Net to encode the perspective changes on four directions (Down, Up, Left and Right). The proposed PCC Net is verified on five mainstream datasets, which achieves the state-of-the-art performance on the one and attains the competitive results on the other four datasets. The source code is available at https://github.com/gjy3035/PCC-Net.
Tasks	Crowd Counting
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10085v1
PDF	https://arxiv.org/pdf/1905.10085v1.pdf
PWC	https://paperswithcode.com/paper/pcc-net-perspective-crowd-counting-via
Repo	https://github.com/gjy3035/PCC-Net
Framework	pytorch

MOSS: End-to-End Dialog System Framework with Modular Supervision


Title	MOSS: End-to-End Dialog System Framework with Modular Supervision
Authors	Weixin Liang, Youzhi Tian, Chengcai Chen, Zhou Yu
Abstract	A major bottleneck in training end-to-end task-oriented dialog system is the lack of data. To utilize limited training data more efficiently, we propose Modular Supervision Network (MOSS), an encoder-decoder training framework that could incorporate supervision from various intermediate dialog system modules including natural language understanding, dialog state tracking, dialog policy learning, and natural language generation. With only 60% of the training data, MOSS-all (i.e., MOSS with supervision from all four dialog modules) outperforms state-of-the-art models on CamRest676. Moreover, introducing modular supervision has even bigger benefits when the dialog task has a more complex dialog state and action space. With only 40% of the training data, MOSS-all outperforms the state-of-the-art model on a complex laptop network troubleshooting dataset, LaptopNetwork, that we introduced. LaptopNetwork consists of conversations between real customers and customer service agents in Chinese. Moreover, MOSS framework can accommodate dialogs that have supervision from different dialog modules at both the framework level and model level. Therefore, MOSS is extremely flexible to update in a real-world deployment.
Tasks	Text Generation
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05528v1
PDF	https://arxiv.org/pdf/1909.05528v1.pdf
PWC	https://paperswithcode.com/paper/moss-end-to-end-dialog-system-framework-with
Repo	https://github.com/YouzhiTian/MOSS-End-to-End-Dialog-System-Framework-with-Modular-Supervision
Framework	pytorch

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning


Title	Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning
Authors	Sean MacAvaney, Luca Soldaini, Nazli Goharian
Abstract	While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.
Tasks	Ad-Hoc Information Retrieval, Information Retrieval, Zero-Shot Learning
Published	2019-12-30
URL	https://arxiv.org/abs/1912.13080v1
PDF	https://arxiv.org/pdf/1912.13080v1.pdf
PWC	https://paperswithcode.com/paper/teaching-a-new-dog-old-tricks-resurrecting
Repo	https://github.com/Georgetown-IR-Lab/multilingual-neural-ir
Framework	none


Title	Predicting Diffusion Reach Probabilities via Representation Learning on Social Networks
Authors	Furkan Gursoy, Ahmet Onur Durahim
Abstract	Diffusion reach probability between two nodes on a network is defined as the probability of a cascade originating from one node reaching to another node. An infinite number of cascades would enable calculation of true diffusion reach probabilities between any two nodes. However, there exists only a finite number of cascades and one usually has access only to a small portion of all available cascades. In this work, we addressed the problem of estimating diffusion reach probabilities given only a limited number of cascades and partial information about underlying network structure. Our proposed strategy employs node representation learning to generate and feed node embeddings into machine learning algorithms to create models that predict diffusion reach probabilities. We provide experimental analysis using synthetically generated cascades on two real-world social networks. Results show that proposed method is superior to using values calculated from available cascades when the portion of cascades is small.
Tasks	Representation Learning
Published	2019-01-12
URL	http://arxiv.org/abs/1901.03829v1
PDF	http://arxiv.org/pdf/1901.03829v1.pdf
PWC	https://paperswithcode.com/paper/predicting-diffusion-reach-probabilities-via
Repo	https://github.com/furkangursoy/RLforDiffPred
Framework	none

Matrix Nets: A New Deep Architecture for Object Detection


Title	Matrix Nets: A New Deep Architecture for Object Detection
Authors	Abdullah Rashwan, Agastya Kalra, Pascal Poupart
Abstract	We present Matrix Nets (xNets), a new deep architecture for object detection. xNets map objects with different sizes and aspect ratios into layers where the sizes and the aspect ratios of the objects within their layers are nearly uniform. Hence, xNets provide a scale and aspect ratio aware architecture. We leverage xNets to enhance key-points based object detection. Our architecture achieves mAP of 47.8 on MS COCO, which is higher than any other single-shot detector while using half the number of parameters and training 3x faster than the next best architecture.
Tasks	Object Detection
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04646v2
PDF	https://arxiv.org/pdf/1908.04646v2.pdf
PWC	https://paperswithcode.com/paper/matrix-nets-a-new-deep-architecture-for
Repo	https://github.com/lizhe960118/CenterNet.git
Framework	none

Recombinator-k-means: A population based algorithm that exploits k-means++ for recombination


Title	Recombinator-k-means: A population based algorithm that exploits k-means++ for recombination
Authors	Carlo Baldassi
Abstract	We present a simple heuristic algorithm for efficiently optimizing the notoriously hard “minimum sum-of-squares clustering” problem, usually addressed by the classical k-means heuristic and its variants. The algorithm, called recombinator-k-means, is very similar to a genetic algorithmic scheme: it uses populations of configurations, that are optimized independently in parallel and then recombined in a next-iteration population batch by exploiting a variant of the k-means++ seeding algorithm. An additional reweighting mechanism ensures that the population eventually coalesces into a single solution. Extensive tests measuring optimization objective vs computational time on synthetic and real-word data show that it is the only choice, among state-of-the-art alternatives (simple restarts, random swap, genetic algorithm with pairwise-nearest-neighbor crossover), that consistently produces good results at all time scales, outperforming competitors on large and complicated datasets. The only parameter that requires tuning is the population size. The scheme is rather general (it could be applied even to k-medians or k-medoids, for example). Our implementation is publicly available at https://github.com/carlobaldassi/RecombinatorKMeans.jl.
Tasks
Published	2019-05-01
URL	https://arxiv.org/abs/1905.00531v3
PDF	https://arxiv.org/pdf/1905.00531v3.pdf
PWC	https://paperswithcode.com/paper/190500531
Repo	https://github.com/carlobaldassi/RecombinatorKMeans.jl
Framework	none

Show Your Work: Improved Reporting of Experimental Results


Title	Show Your Work: Improved Reporting of Experimental Results
Authors	Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith
Abstract	Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e.g., accuracy) on held-out test data, compared to previous results. In this paper, we demonstrate that test-set performance scores alone are insufficient for drawing accurate conclusions about which model performs best. We argue for reporting additional details, especially performance on validation data obtained during model development. We present a novel technique for doing so: expected validation performance of the best-found model as a function of computation budget (i.e., the number of hyperparameter search trials or the overall training time). Using our approach, we find multiple recent model comparisons where authors would have reached a different conclusion if they had used more (or less) computation. Our approach also allows us to estimate the amount of computation required to obtain a given accuracy; applying it to several recently published results yields massive variation across papers, from hours to weeks. We conclude with a set of best practices for reporting experimental results which allow for robust future comparisons, and provide code to allow researchers to use our technique.
Tasks
Published	2019-09-06
URL	https://arxiv.org/abs/1909.03004v1
PDF	https://arxiv.org/pdf/1909.03004v1.pdf
PWC	https://paperswithcode.com/paper/show-your-work-improved-reporting-of
Repo	https://github.com/allenai/show-your-work
Framework	none

Decision-Directed Data Decomposition


Title	Decision-Directed Data Decomposition
Authors	Brent D. Davis, Ethan Jackson, Daniel J. Lizotte
Abstract	We present an algorithm, Decision-Directed Data Decomposition (D4), which decomposes a dataset into two components. The first contains most of the useful information for a specified supervised learning task. The second orthogonal component contains little information about the task but retains associations and information that were not targeted. The algorithm is simple and scalable. We illustrate its application in image and text processing domains. Our results show that 1) post-hoc application of D4 to an image representation space can remove information about specified concepts without impacting other concepts, 2) D4 is able to improve predictive generalization in certain settings, and 3) applying D4 to word embedding representations produces state-of-the-art results in debiasing.
Tasks	Word Embeddings
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08159v2
PDF	https://arxiv.org/pdf/1909.08159v2.pdf
PWC	https://paperswithcode.com/paper/decision-directed-data-decomposition
Repo	https://github.com/bdavis56/DDDD
Framework	none

Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation


Title	Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation
Authors	Alexander Levine, Soheil Feizi
Abstract	Recently, techniques have been developed to provably guarantee the robustness of a classifier to adversarial perturbations of bounded L_1 and L_2 magnitudes by using randomized smoothing: the robust classification is a consensus of base classifications on randomly noised samples where the noise is additive. In this paper, we extend this technique to the L_0 threat model. We propose an efficient and certifiably robust defense against sparse adversarial attacks by randomly ablating input features, rather than using additive noise. Experimentally, on MNIST, we can certify the classifications of over 50% of images to be robust to any distortion of at most 8 pixels. This is comparable to the observed empirical robustness of unprotected classifiers on MNIST to modern L_0 attacks, demonstrating the tightness of the proposed robustness certificate. We also evaluate our certificate on ImageNet and CIFAR-10. Our certificates represent an improvement on those provided in a concurrent work (Lee et al. 2019) which uses random noise rather than ablation (median certificates of 8 pixels versus 4 pixels on MNIST; 16 pixels versus 1 pixel on ImageNet.) Additionally, we empirically demonstrate that our classifier is highly robust to modern sparse adversarial attacks on MNIST. Our classifications are robust, in median, to adversarial perturbations of up to 31 pixels, compared to 22 pixels reported as the state-of-the-art defense, at the cost of a slight decrease (around 2.3%) in the classification accuracy. Code is available at https://github.com/alevine0/randomizedAblation/.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09272v1
PDF	https://arxiv.org/pdf/1911.09272v1.pdf
PWC	https://paperswithcode.com/paper/robustness-certificates-for-sparse
Repo	https://github.com/alevine0/randomizedAblation
Framework	pytorch

3D Ken Burns Effect from a Single Image


Title	3D Ken Burns Effect from a Single Image
Authors	Simon Niklaus, Long Mai, Jimei Yang, Feng Liu
Abstract	The Ken Burns effect allows animating still images with a virtual camera scan and zoom. Adding parallax, which results in the 3D Ken Burns effect, enables significantly more compelling results. Creating such effects manually is time-consuming and demands sophisticated editing skills. Existing automatic methods, however, require multiple input images from varying viewpoints. In this paper, we introduce a framework that synthesizes the 3D Ken Burns effect from a single image, supporting both a fully automatic mode and an interactive mode with the user controlling the camera. Our framework first leverages a depth prediction pipeline, which estimates scene depth that is suitable for view synthesis tasks. To address the limitations of existing depth estimation methods such as geometric distortions, semantic distortions, and inaccurate depth boundaries, we develop a semantic-aware neural network for depth prediction, couple its estimate with a segmentation-based depth adjustment process, and employ a refinement neural network that facilitates accurate depth predictions at object boundaries. According to this depth estimate, our framework then maps the input image to a point cloud and synthesizes the resulting video frames by rendering the point cloud from the corresponding camera positions. To address disocclusions while maintaining geometrically and temporally coherent synthesis results, we utilize context-aware color- and depth-inpainting to fill in the missing information in the extreme views of the camera path, thus extending the scene geometry of the point cloud. Experiments with a wide variety of image content show that our method enables realistic synthesis results. Our study demonstrates that our system allows users to achieve better results while requiring little effort compared to existing solutions for the 3D Ken Burns effect creation.
Tasks	Depth Estimation
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05483v1
PDF	https://arxiv.org/pdf/1909.05483v1.pdf
PWC	https://paperswithcode.com/paper/3d-ken-burns-effect-from-a-single-image
Repo	https://github.com/sniklaus/3d-ken-burns
Framework	pytorch