October 21, 2019

3482 words 17 mins read

Paper Group AWR 53

Iterative fully convolutional neural networks for automatic vertebra segmentation and identification. Scaling provable adversarial defenses. A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation. Graphite: Iterative Generative Modeling of Graphs. MC-GAN: Multi-conditional Generative Adversarial Network for Image Syn …

Iterative fully convolutional neural networks for automatic vertebra segmentation and identification


Title	Iterative fully convolutional neural networks for automatic vertebra segmentation and identification
Authors	Nikolas Lessmann, Bram van Ginneken, Pim A. de Jong, Ivana Išgum
Abstract	Precise segmentation and anatomical identification of the vertebrae provides the basis for automatic analysis of the spine, such as detection of vertebral compression fractures or other abnormalities. Most dedicated spine CT and MR scans as well as scans of the chest, abdomen or neck cover only part of the spine. Segmentation and identification should therefore not rely on the visibility of certain vertebrae or a certain number of vertebrae. We propose an iterative instance segmentation approach that uses a fully convolutional neural network to segment and label vertebrae one after the other, independently of the number of visible vertebrae. This instance-by-instance segmentation is enabled by combining the network with a memory component that retains information about already segmented vertebrae. The network iteratively analyzes image patches, using information from both image and memory to search for the next vertebra. To efficiently traverse the image, we include the prior knowledge that the vertebrae are always located next to each other, which is used to follow the vertebral column. This method was evaluated with five diverse datasets, including multiple modalities (CT and MR), various fields of view and coverages of different sections of the spine, and a particularly challenging set of low-dose chest CT scans. The proposed iterative segmentation method compares favorably with state-of-the-art methods and is fast, flexible and generalizable.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2018-04-12
URL	http://arxiv.org/abs/1804.04383v3
PDF	http://arxiv.org/pdf/1804.04383v3.pdf
PWC	https://paperswithcode.com/paper/iterative-fully-convolutional-neural-networks
Repo	https://github.com/leohsuofnthu/Pytorch-IterativeFCN
Framework	pytorch

Scaling provable adversarial defenses


Title	Scaling provable adversarial defenses
Authors	Eric Wong, Frank R. Schmidt, Jan Hendrik Metzen, J. Zico Kolter
Abstract	Recent work has developed methods for learning deep network classifiers that are provably robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks. In this paper, in an effort to scale these approaches to substantially larger models, we extend previous work in three main directions. First, we present a technique for extending these training procedures to much more general networks, with skip connections (such as ResNets) and general nonlinearities; the approach is fully modular, and can be implemented automatically (analogous to automatic differentiation). Second, in the specific case of $\ell_\infty$ adversarial perturbations and networks with ReLU nonlinearities, we adopt a nonlinear random projection for training, which scales linearly in the number of hidden units (previous approaches scaled quadratically). Third, we show how to further improve robust error through cascade models. On both MNIST and CIFAR data sets, we train classifiers that improve substantially on the state of the art in provable robust adversarial error bounds: from 5.8% to 3.1% on MNIST (with $\ell_\infty$ perturbations of $\epsilon=0.1$), and from 80% to 36.4% on CIFAR (with $\ell_\infty$ perturbations of $\epsilon=2/255$). Code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial/.
Tasks
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12514v2
PDF	http://arxiv.org/pdf/1805.12514v2.pdf
PWC	https://paperswithcode.com/paper/scaling-provable-adversarial-defenses
Repo	https://github.com/ColinQiyangLi/LConvNet
Framework	pytorch

A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation


Title	A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation
Authors	Tsz Kin Lam, Julia Kreutzer, Stefan Riezler
Abstract	We present an approach to interactive-predictive neural machine translation that attempts to reduce human effort from three directions: Firstly, instead of requiring humans to select, correct, or delete segments, we employ the idea of learning from human reinforcements in form of judgments on the quality of partial translations. Secondly, human effort is further reduced by using the entropy of word predictions as uncertainty criterion to trigger feedback requests. Lastly, online updates of the model parameters after every interaction allow the model to adapt quickly. We show in simulation experiments that reward signals on partial translations significantly improve character F-score and BLEU compared to feedback on full translations only, while human effort can be reduced to an average number of $5$ feedback requests for every input.
Tasks	Machine Translation
Published	2018-05-03
URL	http://arxiv.org/abs/1805.01553v3
PDF	http://arxiv.org/pdf/1805.01553v3.pdf
PWC	https://paperswithcode.com/paper/a-reinforcement-learning-approach-to
Repo	https://github.com/heidelkin/BIPNMT
Framework	pytorch

Graphite: Iterative Generative Modeling of Graphs


Title	Graphite: Iterative Generative Modeling of Graphs
Authors	Aditya Grover, Aaron Zweig, Stefano Ermon
Abstract	Graphs are a fundamental abstraction for modeling relational data. However, graphs are discrete and combinatorial in nature, and learning representations suitable for machine learning tasks poses statistical and computational challenges. In this work, we propose Graphite, an algorithmic framework for unsupervised learning of representations over nodes in large graphs using deep latent variable generative models. Our model parameterizes variational autoencoders (VAE) with graph neural networks, and uses a novel iterative graph refinement strategy inspired by low-rank approximations for decoding. On a wide variety of synthetic and benchmark datasets, Graphite outperforms competing approaches for the tasks of density estimation, link prediction, and node classification. Finally, we derive a theoretical connection between message passing in graph neural networks and mean-field variational inference.
Tasks	Density Estimation, Link Prediction, Node Classification
Published	2018-03-28
URL	https://arxiv.org/abs/1803.10459v4
PDF	https://arxiv.org/pdf/1803.10459v4.pdf
PWC	https://paperswithcode.com/paper/graphite-iterative-generative-modeling-of
Repo	https://github.com/ermongroup/graphite
Framework	tf

MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis


Title	MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis
Authors	Hyojin Park, YoungJoon Yoo, Nojun Kwak
Abstract	In this paper, we introduce a new method for generating an object image from text attributes on a desired location, when the base image is given. One step further to the existing studies on text-to-image generation mainly focusing on the object’s appearance, the proposed method aims to generate an object image preserving the given background information, which is the first attempt in this field. To tackle the problem, we propose a multi-conditional GAN (MC-GAN) which controls both the object and background information jointly. As a core component of MC-GAN, we propose a synthesis block which disentangles the object and background information in the training stage. This block enables MC-GAN to generate a realistic object image with the desired background by controlling the amount of the background information from the given base image using the foreground information from the text attributes. From the experiments with Caltech-200 bird and Oxford-102 flower datasets, we show that our model is able to generate photo-realistic images with a resolution of 128 x 128. The source code of MC-GAN is released.
Tasks	Image Generation, Text-to-Image Generation
Published	2018-05-03
URL	http://arxiv.org/abs/1805.01123v5
PDF	http://arxiv.org/pdf/1805.01123v5.pdf
PWC	https://paperswithcode.com/paper/mc-gan-multi-conditional-generative
Repo	https://github.com/HYOJINPARK/MC_GAN
Framework	pytorch

Image Colorization with Generative Adversarial Networks


Title	Image Colorization with Generative Adversarial Networks
Authors	Kamyar Nazeri, Eric Ng, Mehran Ebrahimi
Abstract	Over the last decade, the process of automatic image colorization has been of significant interest for several application areas including restoration of aged or degraded images. This problem is highly ill-posed due to the large degrees of freedom during the assignment of color information. Many of the recent developments in automatic colorization involve images that contain a common theme or require highly processed data such as semantic maps as input. In our approach, we attempt to fully generalize the colorization procedure using a conditional Deep Convolutional Generative Adversarial Network (DCGAN), extend current methods to high-resolution images and suggest training strategies that speed up the process and greatly stabilize it. The network is trained over datasets that are publicly available such as CIFAR-10 and Places365. The results of the generative model and traditional deep neural networks are compared.
Tasks	Colorization
Published	2018-03-14
URL	http://arxiv.org/abs/1803.05400v5
PDF	http://arxiv.org/pdf/1803.05400v5.pdf
PWC	https://paperswithcode.com/paper/image-colorization-with-generative
Repo	https://github.com/PartheshSoni/Image-colorization-using-GANs
Framework	none

Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering


Title	Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering
Authors	Yang Deng, Yuexiang Xie, Yaliang Li, Min Yang, Nan Du, Wei Fan, Kai Lei, Ying Shen
Abstract	Answer selection and knowledge base question answering (KBQA) are two important tasks of question answering (QA) systems. Existing methods solve these two tasks separately, which requires large number of repetitive work and neglects the rich correlation information between tasks. In this paper, we tackle answer selection and KBQA tasks simultaneously via multi-task learning (MTL), motivated by the following motivations. First, both answer selection and KBQA can be regarded as a ranking problem, with one at text-level while the other at knowledge-level. Second, these two tasks can benefit each other: answer selection can incorporate the external knowledge from knowledge base (KB), while KBQA can be improved by learning contextual information from answer selection. To fulfill the goal of jointly learning these two tasks, we propose a novel multi-task learning scheme that utilizes multi-view attention learned from various perspectives to enable these tasks to interact with each other as well as learn more comprehensive sentence representations. The experiments conducted on several real-world datasets demonstrate the effectiveness of the proposed method, and the performance of answer selection and KBQA is improved. Also, the multi-view attention scheme is proved to be effective in assembling attentive information from different representational perspectives.
Tasks	Answer Selection, Knowledge Base Question Answering, Multi-Task Learning, Question Answering
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02354v1
PDF	http://arxiv.org/pdf/1812.02354v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-with-multi-view-attention
Repo	https://github.com/dengyang17/MTQA
Framework	none

Topic Discovery in Massive Text Corpora Based on Min-Hashing


Title	Topic Discovery in Massive Text Corpora Based on Min-Hashing
Authors	Gibran Fuentes-Pineda, Ivan Vladimir Meza-Ruiz
Abstract	The task of discovering topics in text corpora has been dominated by Latent Dirichlet Allocation and other Topic Models for over a decade. In order to apply these approaches to massive text corpora, the vocabulary needs to be reduced considerably and large computer clusters and/or GPUs are typically required. Moreover, the number of topics must be provided beforehand but this depends on the corpus characteristics and it is often difficult to estimate, especially for massive text corpora. Unfortunately, both topic quality and time complexity are sensitive to this choice. This paper describes an alternative approach to discover topics based on Min-Hashing, which can handle massive text corpora and large vocabularies using modest computer hardware and does not require to fix the number of topics in advance. The basic idea is to generate multiple random partitions of the corpus vocabulary to find sets of highly co-occurring words, which are then clustered to produce the final topics. In contrast to probabilistic topic models where topics are distributions over the complete vocabulary, the topics discovered by the proposed approach are sets of highly co-occurring words. Interestingly, these topics underlie various thematics with different levels of granularity. An extensive qualitative and quantitative evaluation using the 20 Newsgroups (18K), Reuters (800K), Spanish Wikipedia (1M), and English Wikipedia (5M) corpora shows that the proposed approach is able to consistently discover meaningful and coherent topics. Remarkably, the time complexity of the proposed approach is linear with respect to corpus and vocabulary size; a non-parallel implementation was able to discover topics from the entire English edition of Wikipedia with over 5 million documents and 1 million words in less than 7 hours.
Tasks	Topic Models
Published	2018-07-03
URL	https://arxiv.org/abs/1807.00938v2
PDF	https://arxiv.org/pdf/1807.00938v2.pdf
PWC	https://paperswithcode.com/paper/topic-discovery-in-massive-text-corpora-based
Repo	https://github.com/gibranfp/Sampled-MinHashing
Framework	none

X-ray-transform Invariant Anatomical Landmark Detection for Pelvic Trauma Surgery


Title	X-ray-transform Invariant Anatomical Landmark Detection for Pelvic Trauma Surgery
Authors	Bastian Bier, Mathias Unberath, Jan-Nico Zaech, Javad Fotouhi, Mehran Armand, Greg Osgood, Nassir Navab, Andreas Maier
Abstract	X-ray image guidance enables percutaneous alternatives to complex procedures. Unfortunately, the indirect view onto the anatomy in addition to projective simplification substantially increase the task-load for the surgeon. Additional 3D information such as knowledge of anatomical landmarks can benefit surgical decision making in complicated scenarios. Automatic detection of these landmarks in transmission imaging is challenging since image-domain features characteristic to a certain landmark change substantially depending on the viewing direction. Consequently and to the best of our knowledge, the above problem has not yet been addressed. In this work, we present a method to automatically detect anatomical landmarks in X-ray images independent of the viewing direction. To this end, a sequential prediction framework based on convolutional layers is trained on synthetically generated data of the pelvic anatomy to predict 23 landmarks in single X-ray images. View independence is contingent on training conditions and, here, is achieved on a spherical segment covering (120 x 90) degrees in LAO/RAO and CRAN/CAUD, respectively, centered around AP. On synthetic data, the proposed approach achieves a mean prediction error of 5.6 +- 4.5 mm. We demonstrate that the proposed network is immediately applicable to clinically acquired data of the pelvis. In particular, we show that our intra-operative landmark detection together with pre-operative CT enables X-ray pose estimation which, ultimately, benefits initialization of image-based 2D/3D registration.
Tasks	Decision Making, Pose Estimation
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08608v1
PDF	http://arxiv.org/pdf/1803.08608v1.pdf
PWC	https://paperswithcode.com/paper/x-ray-transform-invariant-anatomical-landmark
Repo	https://github.com/mathiasunberath/DeepDRR
Framework	pytorch

Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond


Title	Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond
Authors	Xi Ouyang, Yu Cheng, Yifan Jiang, Chun-Liang Li, Pan Zhou
Abstract	State-of-the-art pedestrian detection models have achieved great success in many benchmarks. However, these models require lots of annotation information and the labeling process usually takes much time and efforts. In this paper, we propose a method to generate labeled pedestrian data and adapt them to support the training of pedestrian detectors. The proposed framework is built on the Generative Adversarial Network (GAN) with multiple discriminators, trying to synthesize realistic pedestrians and learn the background context simultaneously. To handle the pedestrians of different sizes, we adopt the Spatial Pyramid Pooling (SPP) layer in the discriminator. We conduct experiments on two benchmarks. The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details. To quantitatively evaluate our approach, we add the generated samples into training data of the baseline pedestrian detectors and show the synthetic images are able to improve the detectors’ performance.
Tasks	Pedestrian Detection
Published	2018-04-05
URL	http://arxiv.org/abs/1804.02047v2
PDF	http://arxiv.org/pdf/1804.02047v2.pdf
PWC	https://paperswithcode.com/paper/pedestrian-synthesis-gan-generating
Repo	https://github.com/HilmiK/PS-Gan-modified
Framework	pytorch

Deep Network Interpolation for Continuous Imagery Effect Transition


Title	Deep Network Interpolation for Continuous Imagery Effect Transition
Authors	Xintao Wang, Ke Yu, Chao Dong, Xiaoou Tang, Chen Change Loy
Abstract	Deep convolutional neural network has demonstrated its capability of learning a deterministic mapping for the desired imagery effect. However, the large variety of user flavors motivates the possibility of continuous transition among different output effects. Unlike existing methods that require a specific design to achieve one particular transition (e.g., style transfer), we propose a simple yet universal approach to attain a smooth control of diverse imagery effects in many low-level vision tasks, including image restoration, image-to-image translation, and style transfer. Specifically, our method, namely Deep Network Interpolation (DNI), applies linear interpolation in the parameter space of two or more correlated networks. A smooth control of imagery effects can be achieved by tweaking the interpolation coefficients. In addition to DNI and its broad applications, we also investigate the mechanism of network interpolation from the perspective of learned filters.
Tasks	Image Restoration, Image-to-Image Translation, Style Transfer
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10515v1
PDF	http://arxiv.org/pdf/1811.10515v1.pdf
PWC	https://paperswithcode.com/paper/deep-network-interpolation-for-continuous
Repo	https://github.com/xinntao/DNI
Framework	pytorch

Challenges in detecting evolutionary forces in language change using diachronic corpora


Title	Challenges in detecting evolutionary forces in language change using diachronic corpora
Authors	Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith
Abstract	Newberry et al. (Detecting evolutionary forces in language change, Nature 551, 2017) tackle an important but difficult problem in linguistics, the testing of selective theories of language change against a null model of drift. Having applied a test from population genetics (the Frequency Increment Test) to a number of relevant examples, they suggest stochasticity has a previously under-appreciated role in language evolution. We replicate their results and find that while the overall observation holds, results produced by this approach on individual time series can be sensitive to how the corpus is organized into temporal segments (binning). Furthermore, we use a large set of simulations in conjunction with binning to systematically explore the range of applicability of the Frequency Increment Test. We conclude that care should be exercised with interpreting results of tests like the Frequency Increment Test on individual series, given the researcher degrees of freedom available when applying the test to corpus data, and fundamental differences between genetic and linguistic data. Our findings have implications for selection testing and temporal binning in general, as well as demonstrating the usefulness of simulations for evaluating methods newly introduced to the field.
Tasks	Time Series
Published	2018-11-03
URL	https://arxiv.org/abs/1811.01275v2
PDF	https://arxiv.org/pdf/1811.01275v2.pdf
PWC	https://paperswithcode.com/paper/challenges-in-detecting-evolutionary-forces
Repo	https://github.com/andreskarjus/wfsim_fit
Framework	none

Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation


Title	Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation
Authors	Liwei Wang, Lunjia Hu, Jiayuan Gu, Yue Wu, Zhiqiang Hu, Kun He, John Hopcroft
Abstract	It is widely believed that learning good representations is one of the main reasons for the success of deep neural networks. Although highly intuitive, there is a lack of theory and systematic approach quantitatively characterizing what representations do deep neural networks learn. In this work, we move a tiny step towards a theory and better understanding of the representations. Specifically, we study a simpler problem: How similar are the representations learned by two networks with identical architecture but trained from different initializations. We develop a rigorous theory based on the neuron activation subspace match model. The theory gives a complete characterization of the structure of neuron activation subspace matches, where the core concepts are maximum match and simple match which describe the overall and the finest similarity between sets of neurons in two networks respectively. We also propose efficient algorithms to find the maximum match and simple matches. Finally, we conduct extensive experiments using our algorithms. Experimental results suggest that, surprisingly, representations learned by the same convolutional layers of networks trained from different initializations are not as similar as prevalently expected, at least in terms of subspace match.
Tasks
Published	2018-10-28
URL	http://arxiv.org/abs/1810.11750v2
PDF	http://arxiv.org/pdf/1810.11750v2.pdf
PWC	https://paperswithcode.com/paper/towards-understanding-learning
Repo	https://github.com/MeckyWu/subspace-match
Framework	pytorch

On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach


Title	On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach
Authors	Nikolai Smolyanskiy, Alexey Kamenev, Stan Birchfield
Abstract	We revisit the problem of visual depth estimation in the context of autonomous vehicles. Despite the progress on monocular depth estimation in recent years, we show that the gap between monocular and stereo depth accuracy remains large$-$a particularly relevant result due to the prevalent reliance upon monocular cameras by vehicles that are expected to be self-driving. We argue that the challenges of removing this gap are significant, owing to fundamental limitations of monocular vision. As a result, we focus our efforts on depth estimation by stereo. We propose a novel semi-supervised learning approach to training a deep stereo neural network, along with a novel architecture containing a machine-learned argmax layer and a custom runtime (that will be shared publicly) that enables a smaller version of our stereo DNN to run on an embedded GPU. Competitive results are shown on the KITTI 2015 stereo dataset. We also evaluate the recent progress of stereo algorithms by measuring the impact upon accuracy of various design criteria.
Tasks	Autonomous Vehicles, Depth Estimation, Stereo Depth Estimation
Published	2018-03-26
URL	http://arxiv.org/abs/1803.09719v3
PDF	http://arxiv.org/pdf/1803.09719v3.pdf
PWC	https://paperswithcode.com/paper/on-the-importance-of-stereo-for-accurate
Repo	https://github.com/iitmcvg/redtail
Framework	tf

Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations


Title	Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations
Authors	Vladimir Nekrasov, Thanuja Dharmasiri, Andrew Spek, Tom Drummond, Chunhua Shen, Ian Reid
Abstract	Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards. Here, we address three of its most prominent hurdles, namely, i) the adaptation of a single model to perform multiple tasks at once (in this work, we consider depth estimation and semantic segmentation crucial for acquiring geometric and semantic understanding of the scene), while ii) doing it in real-time, and iii) using asymmetric datasets with uneven numbers of annotations per each modality. To overcome the first two issues, we adapt a recently proposed real-time semantic segmentation network, making changes to further reduce the number of floating point operations. To approach the third issue, we embrace a simple solution based on hard knowledge distillation under the assumption of having access to a powerful `teacher’ network. We showcase how our system can be easily extended to handle more tasks, and more datasets, all at once, performing depth estimation and segmentation both indoors and outdoors with a single model. Quantitatively, we achieve results equivalent to (or better than) current state-of-the-art approaches with one forward pass costing just 13ms and 6.5 GFLOPs on 640x480 inputs. This efficiency allows us to directly incorporate the raw predictions of our network into the SemanticFusion framework for dense 3D semantic reconstruction of the scene. \|
Tasks	Depth Estimation, Real-Time Semantic Segmentation, Semantic Segmentation, Surface Normals Estimation
Published	2018-09-13
URL	http://arxiv.org/abs/1809.04766v2
PDF	http://arxiv.org/pdf/1809.04766v2.pdf
PWC	https://paperswithcode.com/paper/real-time-joint-semantic-segmentation-and
Repo	https://github.com/DrSleep/light-weight-refinenet
Framework	pytorch