July 29, 2019

3298 words 16 mins read

Paper Group AWR 154

Understanding Convolution for Semantic Segmentation. A systematic study of the class imbalance problem in convolutional neural networks. The Consciousness Prior. Cost-Effective Active Learning for Melanoma Segmentation. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR. Dense 3D Regression for Hand Pose Est …

Understanding Convolution for Semantic Segmentation


Title	Understanding Convolution for Semantic Segmentation
Authors	Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, Garrison Cottrell
Abstract	Recent advances in deep learning, especially deep convolutional neural networks (CNNs), have led to significant improvement over previous semantic segmentation systems. Here we show how to improve pixel-wise semantic segmentation by manipulating convolution-related operations that are of both theoretical and practical value. First, we design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling. Second, we propose a hybrid dilated convolution (HDC) framework in the encoding phase. This framework 1) effectively enlarges the receptive fields (RF) of the network to aggregate global information; 2) alleviates what we call the “gridding issue” caused by the standard dilated convolution operation. We evaluate our approaches thoroughly on the Cityscapes dataset, and achieve a state-of-art result of 80.1% mIOU in the test set at the time of submission. We also have achieved state-of-the-art overall on the KITTI road estimation benchmark and the PASCAL VOC2012 segmentation task. Our source code can be found at https://github.com/TuSimple/TuSimple-DUC .
Tasks	Semantic Segmentation
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08502v3
PDF	http://arxiv.org/pdf/1702.08502v3.pdf
PWC	https://paperswithcode.com/paper/understanding-convolution-for-semantic
Repo	https://github.com/leemathew1998/GradientWeight
Framework	pytorch

A systematic study of the class imbalance problem in convolutional neural networks


Title	A systematic study of the class imbalance problem in convolutional neural networks
Authors	Mateusz Buda, Atsuto Maki, Maciej A. Mazurowski
Abstract	In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that completely eliminates the imbalance, whereas the optimal undersampling ratio depends on the extent of imbalance; (iv) as opposed to some classical machine learning models, oversampling does not cause overfitting of CNNs; (v) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.
Tasks
Published	2017-10-15
URL	http://arxiv.org/abs/1710.05381v2
PDF	http://arxiv.org/pdf/1710.05381v2.pdf
PWC	https://paperswithcode.com/paper/a-systematic-study-of-the-class-imbalance
Repo	https://github.com/ferhatkkochan/deeplearning-notes
Framework	none

The Consciousness Prior


Title	The Consciousness Prior
Authors	Yoshua Bengio
Abstract	A new prior is proposed for learning representations of high-level concepts of the kind we manipulate with language. This prior can be combined with other priors in order to help disentangling abstract factors from each other. It is inspired by cognitive neuroscience theories of consciousness, seen as a bottleneck through which just a few elements, after having been selected by attention from a broader pool, are then broadcast and condition further processing, both in perception and decision-making. The set of recently selected elements one becomes aware of is seen as forming a low-dimensional conscious state. This conscious state is combining the few concepts constituting a conscious thought, i.e., what one is immediately conscious of at a particular moment. We claim that this architectural and information-processing constraint corresponds to assumptions about the joint distribution between high-level concepts. To the extent that these assumptions are generally true (and the form of natural language seems consistent with them), they can form a useful prior for representation learning. A low-dimensional thought or conscious state is analogous to a sentence: it involves only a few variables and yet can make a statement with very high probability of being true. This is consistent with a joint distribution (over high-level concepts) which has the form of a sparse factor graph, i.e., where the dependencies captured by each factor of the factor graph involve only very few variables while creating a strong dip in the overall energy function. The consciousness prior also makes it natural to map conscious states to natural language utterances or to express classical AI knowledge in a form similar to facts and rules, albeit capturing uncertainty as well as efficient search mechanisms implemented by attention mechanisms.
Tasks	Decision Making, Representation Learning
Published	2017-09-25
URL	https://arxiv.org/abs/1709.08568v2
PDF	https://arxiv.org/pdf/1709.08568v2.pdf
PWC	https://paperswithcode.com/paper/the-consciousness-prior
Repo	https://github.com/off99555/machine-learning-curriculum
Framework	tf

Cost-Effective Active Learning for Melanoma Segmentation


Title	Cost-Effective Active Learning for Melanoma Segmentation
Authors	Marc Gorriz, Axel Carlier, Emmanuel Faure, Xavier Giro-i-Nieto
Abstract	We propose a novel Active Learning framework capable to train effectively a convolutional neural network for semantic segmentation of medical imaging, with a limited amount of training labeled data. Our contribution is a practical Cost-Effective Active Learning approach using dropout at test time as Monte Carlo sampling to model the pixel-wise uncertainty and to analyze the image information to improve the training performance. The source code of this project is available at https://marc-gorriz.github.io/CEAL-Medical-Image-Segmentation/ .
Tasks	Active Learning, Medical Image Segmentation, Semantic Segmentation
Published	2017-11-24
URL	http://arxiv.org/abs/1711.09168v2
PDF	http://arxiv.org/pdf/1711.09168v2.pdf
PWC	https://paperswithcode.com/paper/cost-effective-active-learning-for-melanoma
Repo	https://github.com/marc-gorriz/CEAL-Medical-Image-Segmentation
Framework	tf


Title	Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR
Authors	Sandra Wachter, Brent Mittelstadt, Chris Russell
Abstract	There has been much discussion of the right to explanation in the EU General Data Protection Regulation, and its existence, merits, and disadvantages. Implementing a right to explanation that opens the black box of algorithmic decision-making faces major legal and technical barriers. Explaining the functionality of complex algorithmic decision-making systems and their rationale in specific cases is a technically challenging problem. Some explanations may offer little meaningful information to data subjects, raising questions around their value. Explanations of automated decisions need not hinge on the general public understanding how algorithmic systems function. Even though such interpretability is of great importance and should be pursued, explanations can, in principle, be offered without opening the black box. Looking at explanations as a means to help a data subject act rather than merely understand, one could gauge the scope and content of explanations according to the specific goal or action they are intended to support. From the perspective of individuals affected by automated decision-making, we propose three aims for explanations: (1) to inform and help the individual understand why a particular decision was reached, (2) to provide grounds to contest the decision if the outcome is undesired, and (3) to understand what would need to change in order to receive a desired result in the future, based on the current decision-making model. We assess how each of these goals finds support in the GDPR. We suggest data controllers should offer a particular type of explanation, unconditional counterfactual explanations, to support these three aims. These counterfactual explanations describe the smallest change to the world that can be made to obtain a desirable outcome, or to arrive at the closest possible world, without needing to explain the internal logic of the system.
Tasks	Decision Making
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00399v3
PDF	http://arxiv.org/pdf/1711.00399v3.pdf
PWC	https://paperswithcode.com/paper/counterfactual-explanations-without-opening
Repo	https://github.com/microsoft/DiCE
Framework	tf

Dense 3D Regression for Hand Pose Estimation


Title	Dense 3D Regression for Hand Pose Estimation
Authors	Chengde Wan, Thomas Probst, Luc Van Gool, Angela Yao
Abstract	We present a simple and effective method for 3D hand pose estimation from a single depth frame. As opposed to previous state-of-the-art methods based on holistic 3D regression, our method works on dense pixel-wise estimation. This is achieved by careful design choices in pose parameterization, which leverages both 2D and 3D properties of depth map. Specifically, we decompose the pose parameters into a set of per-pixel estimations, i.e., 2D heat maps, 3D heat maps and unit 3D directional vector fields. The 2D/3D joint heat maps and 3D joint offsets are estimated via multi-task network cascades, which is trained end-to-end. The pixel-wise estimations can be directly translated into a vote casting scheme. A variant of mean shift is then used to aggregate local votes while enforcing consensus between the the estimated 3D pose and the pixel-wise 2D and 3D estimations by design. Our method is efficient and highly accurate. On MSRA and NYU hand dataset, our method outperforms all previous state-of-the-art approaches by a large margin. On the ICVL hand dataset, our method achieves similar accuracy compared to the currently proposed nearly saturated result and outperforms various other proposed methods. Code is available $\href{“https://github.com/melonwan/denseReg"}{\text{online}}$.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2017-11-24
URL	http://arxiv.org/abs/1711.08996v1
PDF	http://arxiv.org/pdf/1711.08996v1.pdf
PWC	https://paperswithcode.com/paper/dense-3d-regression-for-hand-pose-estimation
Repo	https://github.com/melonwan/denseReg
Framework	tf

Identifying networks with common organizational principles


Title	Identifying networks with common organizational principles
Authors	Anatol E. Wegner, Luis Ospina-Forero, Robert E. Gaunt, Charlotte M. Deane, Gesine Reinert
Abstract	Many complex systems can be represented as networks, and the problem of network comparison is becoming increasingly relevant. There are many techniques for network comparison, from simply comparing network summary statistics to sophisticated but computationally costly alignment-based approaches. Yet it remains challenging to accurately cluster networks that are of a different size and density, but hypothesized to be structurally similar. In this paper, we address this problem by introducing a new network comparison methodology that is aimed at identifying common organizational principles in networks. The methodology is simple, intuitive and applicable in a wide variety of settings ranging from the functional classification of proteins to tracking the evolution of a world trade network.
Tasks
Published	2017-04-02
URL	http://arxiv.org/abs/1704.00387v1
PDF	http://arxiv.org/pdf/1704.00387v1.pdf
PWC	https://paperswithcode.com/paper/identifying-networks-with-common
Repo	https://github.com/alan-turing-institute/network-comparison
Framework	none

Intelligent Word Embeddings of Free-Text Radiology Reports


Title	Intelligent Word Embeddings of Free-Text Radiology Reports
Authors	Imon Banerjee, Sriraman Madhavan, Roger Eric Goldman, Daniel L. Rubin
Abstract	Radiology reports are a rich resource for advancing deep learning applications in medicine by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the ambiguity and subtlety of natural language. We propose a hybrid strategy that combines semantic-dictionary mapping and word2vec modeling for creating dense vector embeddings of free-text radiology reports. Our method leverages the benefits of both semantic-dictionary mapping as well as unsupervised learning. Using the vector representation, we automatically classify the radiology reports into three classes denoting confidence in the diagnosis of intracranial hemorrhage by the interpreting radiologist. We performed experiments with varying hyperparameter settings of the word embeddings and a range of different classifiers. Best performance achieved was a weighted precision of 88% and weighted recall of 90%. Our work offers the potential to leverage unstructured electronic health record data by allowing direct analysis of narrative clinical notes.
Tasks	Word Embeddings
Published	2017-11-19
URL	http://arxiv.org/abs/1711.06968v1
PDF	http://arxiv.org/pdf/1711.06968v1.pdf
PWC	https://paperswithcode.com/paper/intelligent-word-embeddings-of-free-text
Repo	https://github.com/imonban/RadiologyReportEmbedding
Framework	none

Collaborative Nested Sampling: Big Data vs. complex physical models


Title	Collaborative Nested Sampling: Big Data vs. complex physical models
Authors	Johannes Buchner
Abstract	The data torrent unleashed by current and upcoming astronomical surveys demands scalable analysis methods. Many machine learning approaches scale well, but separating the instrument measurement from the physical effects of interest, dealing with variable errors, and deriving parameter uncertainties is often an after-thought. Classic forward-folding analyses with Markov Chain Monte Carlo or Nested Sampling enable parameter estimation and model comparison, even for complex and slow-to-evaluate physical models. However, these approaches require independent runs for each data set, implying an unfeasible number of model evaluations in the Big Data regime. Here I present a new algorithm, collaborative nested sampling, for deriving parameter probability distributions for each observation. Importantly, the number of physical model evaluations scales sub-linearly with the number of data sets, and no assumptions about homogeneous errors, Gaussianity, the form of the model or heterogeneity/completeness of the observations need to be made. Collaborative nested sampling has immediate application in speeding up analyses of large surveys, integral-field-unit observations, and Monte Carlo simulations.
Tasks
Published	2017-07-14
URL	http://arxiv.org/abs/1707.04476v5
PDF	http://arxiv.org/pdf/1707.04476v5.pdf
PWC	https://paperswithcode.com/paper/collaborative-nested-sampling-big-data-vs
Repo	https://github.com/JohannesBuchner/massivedatans
Framework	none

Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size


Title	Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size
Authors	Ke Ma, Jinshan Zeng, Jiechao Xiong, Qianqian Xu, Xiaochun Cao, Wei Liu, Yuan Yao
Abstract	Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via dropping convexity, with good scalability by the use of stochastic algorithm, i.e., stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic \textit{non-convex} optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate $\mathcal{O}(\frac{1}{T})$ in our setting, where $T$ is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance.
Tasks
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06446v2
PDF	http://arxiv.org/pdf/1711.06446v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-non-convex-ordinal-embedding-with
Repo	https://github.com/alphaprime/Stabilized_Stochastic_BB
Framework	none

3D Densely Convolutional Networks for Volumetric Segmentation


Title	3D Densely Convolutional Networks for Volumetric Segmentation
Authors	Toan Duc Bui, Jitae Shin, Taesup Moon
Abstract	In the isointense stage, the accurate volumetric image segmentation is a challenging task due to the low contrast between tissues. In this paper, we propose a novel very deep network architecture based on a densely convolutional network for volumetric brain segmentation. The proposed network architecture provides a dense connection between layers that aims to improve the information flow in the network. By concatenating features map of fine and coarse dense blocks, it allows capturing multi-scale contextual information. Experimental results demonstrate significant advantages of the proposed method over existing methods, in terms of both segmentation accuracy and parameter efficiency in MICCAI grand challenge on 6-month infant brain MRI segmentation.
Tasks	Brain Segmentation, Infant Brain Mri Segmentation, Semantic Segmentation
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03199v2
PDF	http://arxiv.org/pdf/1709.03199v2.pdf
PWC	https://paperswithcode.com/paper/3d-densely-convolutional-networks-for
Repo	https://github.com/tbuikr/3D_DenseSeg
Framework	caffe2

MicroExpNet: An Extremely Small and Fast Model For Expression Recognition From Face Images


Title	MicroExpNet: An Extremely Small and Fast Model For Expression Recognition From Face Images
Authors	İlke Çuğu, Eren Şener, Emre Akbaş
Abstract	This paper is aimed at creating extremely small and fast convolutional neural networks (CNN) for the problem of facial expression recognition (FER) from frontal face images. To this end, we employed the popular knowledge distillation (KD) method and identified two major shortcomings with its use: 1) a fine-grained grid search is needed for tuning the temperature hyperparameter and 2) to find the optimal size-accuracy balance, one needs to search for the final network size (or the compression rate). On the other hand, KD is proved to be useful for model compression for the FER problem, and we discovered that its effects gets more and more significant with the decreasing model size. In addition, we hypothesized that translation invariance achieved using max-pooling layers would not be useful for the FER problem as the expressions are sensitive to small, pixel-wise changes around the eye and the mouth. However, we have found an intriguing improvement on generalization when max-pooling is used. We conducted experiments on two widely-used FER datasets, CK+ and Oulu-CASIA. Our smallest model (MicroExpNet), obtained using knowledge distillation, is less than 1MB in size and works at 1851 frames per second on an Intel i7 CPU. Despite being less accurate than the state-of-the-art, MicroExpNet still provides significant insights for designing a microarchitecture for the FER problem.
Tasks	Facial Expression Recognition, Model Compression
Published	2017-11-19
URL	https://arxiv.org/abs/1711.07011v4
PDF	https://arxiv.org/pdf/1711.07011v4.pdf
PWC	https://paperswithcode.com/paper/microexpnet-an-extremely-small-and-fast-model
Repo	https://github.com/cuguilke/microexpnet
Framework	tf

Towards Automatic Learning of Procedures from Web Instructional Videos


Title	Towards Automatic Learning of Procedures from Web Instructional Videos
Authors	Luowei Zhou, Chenliang Xu, Jason J. Corso
Abstract	The potential for agents, whether embodied or software, to learn by observing other agents performing procedures involving objects and actions is rich. Current research on automatic procedure learning heavily relies on action labels or video subtitles, even during the evaluation phase, which makes them infeasible in real-world scenarios. This leads to our question: can the human-consensus structure of a procedure be learned from a large set of long, unconstrained videos (e.g., instructional videos from YouTube) with only visual evidence? To answer this question, we introduce the problem of procedure segmentation–to segment a video procedure into category-independent procedure segments. Given that no large-scale dataset is available for this problem, we collect a large-scale procedure segmentation dataset with procedure segments temporally localized and described; we use cooking videos and name the dataset YouCook2. We propose a segment-level recurrent network for generating procedure segments by modeling the dependencies across segments. The generated segments can be used as pre-processing for other tasks, such as dense video captioning and event parsing. We show in our experiments that the proposed model outperforms competitive baselines in procedure segmentation.
Tasks	Dense Video Captioning, Video Captioning
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09788v3
PDF	http://arxiv.org/pdf/1703.09788v3.pdf
PWC	https://paperswithcode.com/paper/towards-automatic-learning-of-procedures-from
Repo	https://github.com/LuoweiZhou/ProcNets-YouCook2
Framework	torch

Glyph-aware Embedding of Chinese Characters


Title	Glyph-aware Embedding of Chinese Characters
Authors	Falcon Z. Dai, Zheng Cai
Abstract	Given the advantage and recent success of English character-level and subword-unit models in several NLP tasks, we consider the equivalent modeling problem for Chinese. Chinese script is logographic and many Chinese logograms are composed of common substructures that provide semantic, phonetic and syntactic hints. In this work, we propose to explicitly incorporate the visual appearance of a character’s glyph in its representation, resulting in a novel glyph-aware embedding of Chinese characters. Being inspired by the success of convolutional neural networks in computer vision, we use them to incorporate the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In the context of two basic Chinese NLP tasks of language modeling and word segmentation, the model learns to represent each character’s task-relevant semantic and syntactic information in the character-level embedding.
Tasks	Language Modelling
Published	2017-08-31
URL	http://arxiv.org/abs/1709.00028v1
PDF	http://arxiv.org/pdf/1709.00028v1.pdf
PWC	https://paperswithcode.com/paper/glyph-aware-embedding-of-chinese-characters
Repo	https://github.com/falcondai/chinese-char-lm
Framework	tf

Neural AMR: Sequence-to-Sequence Models for Parsing and Generation


Title	Neural AMR: Sequence-to-Sequence Models for Parsing and Generation
Authors	Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer
Abstract	Sequence-to-sequence models have shown strong performance across a broad range of applications. However, their application to parsing and generating text usingAbstract Meaning Representation (AMR)has been limited, due to the relatively limited amount of labeled data and the non-sequential nature of the AMR graphs. We present a novel training procedure that can lift this limitation using millions of unlabeled sentences and careful preprocessing of the AMR graphs. For AMR parsing, our model achieves competitive results of 62.1SMATCH, the current best score reported without significant use of external semantic resources. For AMR generation, our model establishes a new state-of-the-art performance of BLEU 33.8. We present extensive ablative and qualitative analysis including strong evidence that sequence-based AMR models are robust against ordering variations of graph-to-sequence conversions.
Tasks	Amr Parsing, Graph-to-Sequence
Published	2017-04-26
URL	http://arxiv.org/abs/1704.08381v3
PDF	http://arxiv.org/pdf/1704.08381v3.pdf
PWC	https://paperswithcode.com/paper/neural-amr-sequence-to-sequence-models-for
Repo	https://github.com/sinantie/NeuralAmr
Framework	torch