Paper Group AWR 53
Iterative fully convolutional neural networks for automatic vertebra segmentation and identification. Scaling provable adversarial defenses. A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation. Graphite: Iterative Generative Modeling of Graphs. MC-GAN: Multi-conditional Generative Adversarial Network for Image Syn …
Iterative fully convolutional neural networks for automatic vertebra segmentation and identification
Title | Iterative fully convolutional neural networks for automatic vertebra segmentation and identification |
Authors | Nikolas Lessmann, Bram van Ginneken, Pim A. de Jong, Ivana Išgum |
Abstract | Precise segmentation and anatomical identification of the vertebrae provides the basis for automatic analysis of the spine, such as detection of vertebral compression fractures or other abnormalities. Most dedicated spine CT and MR scans as well as scans of the chest, abdomen or neck cover only part of the spine. Segmentation and identification should therefore not rely on the visibility of certain vertebrae or a certain number of vertebrae. We propose an iterative instance segmentation approach that uses a fully convolutional neural network to segment and label vertebrae one after the other, independently of the number of visible vertebrae. This instance-by-instance segmentation is enabled by combining the network with a memory component that retains information about already segmented vertebrae. The network iteratively analyzes image patches, using information from both image and memory to search for the next vertebra. To efficiently traverse the image, we include the prior knowledge that the vertebrae are always located next to each other, which is used to follow the vertebral column. This method was evaluated with five diverse datasets, including multiple modalities (CT and MR), various fields of view and coverages of different sections of the spine, and a particularly challenging set of low-dose chest CT scans. The proposed iterative segmentation method compares favorably with state-of-the-art methods and is fast, flexible and generalizable. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2018-04-12 |
URL | http://arxiv.org/abs/1804.04383v3 |
http://arxiv.org/pdf/1804.04383v3.pdf | |
PWC | https://paperswithcode.com/paper/iterative-fully-convolutional-neural-networks |
Repo | https://github.com/leohsuofnthu/Pytorch-IterativeFCN |
Framework | pytorch |
Scaling provable adversarial defenses
Title | Scaling provable adversarial defenses |
Authors | Eric Wong, Frank R. Schmidt, Jan Hendrik Metzen, J. Zico Kolter |
Abstract | Recent work has developed methods for learning deep network classifiers that are provably robust to norm-bounded adversarial perturbation; however, these methods are currently only possible for relatively small feedforward networks. In this paper, in an effort to scale these approaches to substantially larger models, we extend previous work in three main directions. First, we present a technique for extending these training procedures to much more general networks, with skip connections (such as ResNets) and general nonlinearities; the approach is fully modular, and can be implemented automatically (analogous to automatic differentiation). Second, in the specific case of $\ell_\infty$ adversarial perturbations and networks with ReLU nonlinearities, we adopt a nonlinear random projection for training, which scales linearly in the number of hidden units (previous approaches scaled quadratically). Third, we show how to further improve robust error through cascade models. On both MNIST and CIFAR data sets, we train classifiers that improve substantially on the state of the art in provable robust adversarial error bounds: from 5.8% to 3.1% on MNIST (with $\ell_\infty$ perturbations of $\epsilon=0.1$), and from 80% to 36.4% on CIFAR (with $\ell_\infty$ perturbations of $\epsilon=2/255$). Code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial/. |
Tasks | |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12514v2 |
http://arxiv.org/pdf/1805.12514v2.pdf | |
PWC | https://paperswithcode.com/paper/scaling-provable-adversarial-defenses |
Repo | https://github.com/ColinQiyangLi/LConvNet |
Framework | pytorch |
A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation
Title | A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation |
Authors | Tsz Kin Lam, Julia Kreutzer, Stefan Riezler |
Abstract | We present an approach to interactive-predictive neural machine translation that attempts to reduce human effort from three directions: Firstly, instead of requiring humans to select, correct, or delete segments, we employ the idea of learning from human reinforcements in form of judgments on the quality of partial translations. Secondly, human effort is further reduced by using the entropy of word predictions as uncertainty criterion to trigger feedback requests. Lastly, online updates of the model parameters after every interaction allow the model to adapt quickly. We show in simulation experiments that reward signals on partial translations significantly improve character F-score and BLEU compared to feedback on full translations only, while human effort can be reduced to an average number of $5$ feedback requests for every input. |
Tasks | Machine Translation |
Published | 2018-05-03 |
URL | http://arxiv.org/abs/1805.01553v3 |
http://arxiv.org/pdf/1805.01553v3.pdf | |
PWC | https://paperswithcode.com/paper/a-reinforcement-learning-approach-to |
Repo | https://github.com/heidelkin/BIPNMT |
Framework | pytorch |
Graphite: Iterative Generative Modeling of Graphs
Title | Graphite: Iterative Generative Modeling of Graphs |
Authors | Aditya Grover, Aaron Zweig, Stefano Ermon |
Abstract | Graphs are a fundamental abstraction for modeling relational data. However, graphs are discrete and combinatorial in nature, and learning representations suitable for machine learning tasks poses statistical and computational challenges. In this work, we propose Graphite, an algorithmic framework for unsupervised learning of representations over nodes in large graphs using deep latent variable generative models. Our model parameterizes variational autoencoders (VAE) with graph neural networks, and uses a novel iterative graph refinement strategy inspired by low-rank approximations for decoding. On a wide variety of synthetic and benchmark datasets, Graphite outperforms competing approaches for the tasks of density estimation, link prediction, and node classification. Finally, we derive a theoretical connection between message passing in graph neural networks and mean-field variational inference. |
Tasks | Density Estimation, Link Prediction, Node Classification |
Published | 2018-03-28 |
URL | https://arxiv.org/abs/1803.10459v4 |
https://arxiv.org/pdf/1803.10459v4.pdf | |
PWC | https://paperswithcode.com/paper/graphite-iterative-generative-modeling-of |
Repo | https://github.com/ermongroup/graphite |
Framework | tf |
MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis
Title | MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis |
Authors | Hyojin Park, YoungJoon Yoo, Nojun Kwak |
Abstract | In this paper, we introduce a new method for generating an object image from text attributes on a desired location, when the base image is given. One step further to the existing studies on text-to-image generation mainly focusing on the object’s appearance, the proposed method aims to generate an object image preserving the given background information, which is the first attempt in this field. To tackle the problem, we propose a multi-conditional GAN (MC-GAN) which controls both the object and background information jointly. As a core component of MC-GAN, we propose a synthesis block which disentangles the object and background information in the training stage. This block enables MC-GAN to generate a realistic object image with the desired background by controlling the amount of the background information from the given base image using the foreground information from the text attributes. From the experiments with Caltech-200 bird and Oxford-102 flower datasets, we show that our model is able to generate photo-realistic images with a resolution of 128 x 128. The source code of MC-GAN is released. |
Tasks | Image Generation, Text-to-Image Generation |
Published | 2018-05-03 |
URL | http://arxiv.org/abs/1805.01123v5 |
http://arxiv.org/pdf/1805.01123v5.pdf | |
PWC | https://paperswithcode.com/paper/mc-gan-multi-conditional-generative |
Repo | https://github.com/HYOJINPARK/MC_GAN |
Framework | pytorch |
Image Colorization with Generative Adversarial Networks
Title | Image Colorization with Generative Adversarial Networks |
Authors | Kamyar Nazeri, Eric Ng, Mehran Ebrahimi |
Abstract | Over the last decade, the process of automatic image colorization has been of significant interest for several application areas including restoration of aged or degraded images. This problem is highly ill-posed due to the large degrees of freedom during the assignment of color information. Many of the recent developments in automatic colorization involve images that contain a common theme or require highly processed data such as semantic maps as input. In our approach, we attempt to fully generalize the colorization procedure using a conditional Deep Convolutional Generative Adversarial Network (DCGAN), extend current methods to high-resolution images and suggest training strategies that speed up the process and greatly stabilize it. The network is trained over datasets that are publicly available such as CIFAR-10 and Places365. The results of the generative model and traditional deep neural networks are compared. |
Tasks | Colorization |
Published | 2018-03-14 |
URL | http://arxiv.org/abs/1803.05400v5 |
http://arxiv.org/pdf/1803.05400v5.pdf | |
PWC | https://paperswithcode.com/paper/image-colorization-with-generative |
Repo | https://github.com/PartheshSoni/Image-colorization-using-GANs |
Framework | none |
Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering
Title | Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering |
Authors | Yang Deng, Yuexiang Xie, Yaliang Li, Min Yang, Nan Du, Wei Fan, Kai Lei, Ying Shen |
Abstract | Answer selection and knowledge base question answering (KBQA) are two important tasks of question answering (QA) systems. Existing methods solve these two tasks separately, which requires large number of repetitive work and neglects the rich correlation information between tasks. In this paper, we tackle answer selection and KBQA tasks simultaneously via multi-task learning (MTL), motivated by the following motivations. First, both answer selection and KBQA can be regarded as a ranking problem, with one at text-level while the other at knowledge-level. Second, these two tasks can benefit each other: answer selection can incorporate the external knowledge from knowledge base (KB), while KBQA can be improved by learning contextual information from answer selection. To fulfill the goal of jointly learning these two tasks, we propose a novel multi-task learning scheme that utilizes multi-view attention learned from various perspectives to enable these tasks to interact with each other as well as learn more comprehensive sentence representations. The experiments conducted on several real-world datasets demonstrate the effectiveness of the proposed method, and the performance of answer selection and KBQA is improved. Also, the multi-view attention scheme is proved to be effective in assembling attentive information from different representational perspectives. |
Tasks | Answer Selection, Knowledge Base Question Answering, Multi-Task Learning, Question Answering |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02354v1 |
http://arxiv.org/pdf/1812.02354v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-learning-with-multi-view-attention |
Repo | https://github.com/dengyang17/MTQA |
Framework | none |
Topic Discovery in Massive Text Corpora Based on Min-Hashing
Title | Topic Discovery in Massive Text Corpora Based on Min-Hashing |
Authors | Gibran Fuentes-Pineda, Ivan Vladimir Meza-Ruiz |
Abstract | The task of discovering topics in text corpora has been dominated by Latent Dirichlet Allocation and other Topic Models for over a decade. In order to apply these approaches to massive text corpora, the vocabulary needs to be reduced considerably and large computer clusters and/or GPUs are typically required. Moreover, the number of topics must be provided beforehand but this depends on the corpus characteristics and it is often difficult to estimate, especially for massive text corpora. Unfortunately, both topic quality and time complexity are sensitive to this choice. This paper describes an alternative approach to discover topics based on Min-Hashing, which can handle massive text corpora and large vocabularies using modest computer hardware and does not require to fix the number of topics in advance. The basic idea is to generate multiple random partitions of the corpus vocabulary to find sets of highly co-occurring words, which are then clustered to produce the final topics. In contrast to probabilistic topic models where topics are distributions over the complete vocabulary, the topics discovered by the proposed approach are sets of highly co-occurring words. Interestingly, these topics underlie various thematics with different levels of granularity. An extensive qualitative and quantitative evaluation using the 20 Newsgroups (18K), Reuters (800K), Spanish Wikipedia (1M), and English Wikipedia (5M) corpora shows that the proposed approach is able to consistently discover meaningful and coherent topics. Remarkably, the time complexity of the proposed approach is linear with respect to corpus and vocabulary size; a non-parallel implementation was able to discover topics from the entire English edition of Wikipedia with over 5 million documents and 1 million words in less than 7 hours. |
Tasks | Topic Models |
Published | 2018-07-03 |
URL | https://arxiv.org/abs/1807.00938v2 |
https://arxiv.org/pdf/1807.00938v2.pdf | |
PWC | https://paperswithcode.com/paper/topic-discovery-in-massive-text-corpora-based |
Repo | https://github.com/gibranfp/Sampled-MinHashing |
Framework | none |
X-ray-transform Invariant Anatomical Landmark Detection for Pelvic Trauma Surgery
Title | X-ray-transform Invariant Anatomical Landmark Detection for Pelvic Trauma Surgery |
Authors | Bastian Bier, Mathias Unberath, Jan-Nico Zaech, Javad Fotouhi, Mehran Armand, Greg Osgood, Nassir Navab, Andreas Maier |
Abstract | X-ray image guidance enables percutaneous alternatives to complex procedures. Unfortunately, the indirect view onto the anatomy in addition to projective simplification substantially increase the task-load for the surgeon. Additional 3D information such as knowledge of anatomical landmarks can benefit surgical decision making in complicated scenarios. Automatic detection of these landmarks in transmission imaging is challenging since image-domain features characteristic to a certain landmark change substantially depending on the viewing direction. Consequently and to the best of our knowledge, the above problem has not yet been addressed. In this work, we present a method to automatically detect anatomical landmarks in X-ray images independent of the viewing direction. To this end, a sequential prediction framework based on convolutional layers is trained on synthetically generated data of the pelvic anatomy to predict 23 landmarks in single X-ray images. View independence is contingent on training conditions and, here, is achieved on a spherical segment covering (120 x 90) degrees in LAO/RAO and CRAN/CAUD, respectively, centered around AP. On synthetic data, the proposed approach achieves a mean prediction error of 5.6 +- 4.5 mm. We demonstrate that the proposed network is immediately applicable to clinically acquired data of the pelvis. In particular, we show that our intra-operative landmark detection together with pre-operative CT enables X-ray pose estimation which, ultimately, benefits initialization of image-based 2D/3D registration. |
Tasks | Decision Making, Pose Estimation |
Published | 2018-03-22 |
URL | http://arxiv.org/abs/1803.08608v1 |
http://arxiv.org/pdf/1803.08608v1.pdf | |
PWC | https://paperswithcode.com/paper/x-ray-transform-invariant-anatomical-landmark |
Repo | https://github.com/mathiasunberath/DeepDRR |
Framework | pytorch |
Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond
Title | Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond |
Authors | Xi Ouyang, Yu Cheng, Yifan Jiang, Chun-Liang Li, Pan Zhou |
Abstract | State-of-the-art pedestrian detection models have achieved great success in many benchmarks. However, these models require lots of annotation information and the labeling process usually takes much time and efforts. In this paper, we propose a method to generate labeled pedestrian data and adapt them to support the training of pedestrian detectors. The proposed framework is built on the Generative Adversarial Network (GAN) with multiple discriminators, trying to synthesize realistic pedestrians and learn the background context simultaneously. To handle the pedestrians of different sizes, we adopt the Spatial Pyramid Pooling (SPP) layer in the discriminator. We conduct experiments on two benchmarks. The results show that our framework can smoothly synthesize pedestrians on background images of variations and different levels of details. To quantitatively evaluate our approach, we add the generated samples into training data of the baseline pedestrian detectors and show the synthetic images are able to improve the detectors’ performance. |
Tasks | Pedestrian Detection |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1804.02047v2 |
http://arxiv.org/pdf/1804.02047v2.pdf | |
PWC | https://paperswithcode.com/paper/pedestrian-synthesis-gan-generating |
Repo | https://github.com/HilmiK/PS-Gan-modified |
Framework | pytorch |
Deep Network Interpolation for Continuous Imagery Effect Transition
Title | Deep Network Interpolation for Continuous Imagery Effect Transition |
Authors | Xintao Wang, Ke Yu, Chao Dong, Xiaoou Tang, Chen Change Loy |
Abstract | Deep convolutional neural network has demonstrated its capability of learning a deterministic mapping for the desired imagery effect. However, the large variety of user flavors motivates the possibility of continuous transition among different output effects. Unlike existing methods that require a specific design to achieve one particular transition (e.g., style transfer), we propose a simple yet universal approach to attain a smooth control of diverse imagery effects in many low-level vision tasks, including image restoration, image-to-image translation, and style transfer. Specifically, our method, namely Deep Network Interpolation (DNI), applies linear interpolation in the parameter space of two or more correlated networks. A smooth control of imagery effects can be achieved by tweaking the interpolation coefficients. In addition to DNI and its broad applications, we also investigate the mechanism of network interpolation from the perspective of learned filters. |
Tasks | Image Restoration, Image-to-Image Translation, Style Transfer |
Published | 2018-11-26 |
URL | http://arxiv.org/abs/1811.10515v1 |
http://arxiv.org/pdf/1811.10515v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-network-interpolation-for-continuous |
Repo | https://github.com/xinntao/DNI |
Framework | pytorch |
Challenges in detecting evolutionary forces in language change using diachronic corpora
Title | Challenges in detecting evolutionary forces in language change using diachronic corpora |
Authors | Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith |
Abstract | Newberry et al. (Detecting evolutionary forces in language change, Nature 551, 2017) tackle an important but difficult problem in linguistics, the testing of selective theories of language change against a null model of drift. Having applied a test from population genetics (the Frequency Increment Test) to a number of relevant examples, they suggest stochasticity has a previously under-appreciated role in language evolution. We replicate their results and find that while the overall observation holds, results produced by this approach on individual time series can be sensitive to how the corpus is organized into temporal segments (binning). Furthermore, we use a large set of simulations in conjunction with binning to systematically explore the range of applicability of the Frequency Increment Test. We conclude that care should be exercised with interpreting results of tests like the Frequency Increment Test on individual series, given the researcher degrees of freedom available when applying the test to corpus data, and fundamental differences between genetic and linguistic data. Our findings have implications for selection testing and temporal binning in general, as well as demonstrating the usefulness of simulations for evaluating methods newly introduced to the field. |
Tasks | Time Series |
Published | 2018-11-03 |
URL | https://arxiv.org/abs/1811.01275v2 |
https://arxiv.org/pdf/1811.01275v2.pdf | |
PWC | https://paperswithcode.com/paper/challenges-in-detecting-evolutionary-forces |
Repo | https://github.com/andreskarjus/wfsim_fit |
Framework | none |
Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation
Title | Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation |
Authors | Liwei Wang, Lunjia Hu, Jiayuan Gu, Yue Wu, Zhiqiang Hu, Kun He, John Hopcroft |
Abstract | It is widely believed that learning good representations is one of the main reasons for the success of deep neural networks. Although highly intuitive, there is a lack of theory and systematic approach quantitatively characterizing what representations do deep neural networks learn. In this work, we move a tiny step towards a theory and better understanding of the representations. Specifically, we study a simpler problem: How similar are the representations learned by two networks with identical architecture but trained from different initializations. We develop a rigorous theory based on the neuron activation subspace match model. The theory gives a complete characterization of the structure of neuron activation subspace matches, where the core concepts are maximum match and simple match which describe the overall and the finest similarity between sets of neurons in two networks respectively. We also propose efficient algorithms to find the maximum match and simple matches. Finally, we conduct extensive experiments using our algorithms. Experimental results suggest that, surprisingly, representations learned by the same convolutional layers of networks trained from different initializations are not as similar as prevalently expected, at least in terms of subspace match. |
Tasks | |
Published | 2018-10-28 |
URL | http://arxiv.org/abs/1810.11750v2 |
http://arxiv.org/pdf/1810.11750v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-understanding-learning |
Repo | https://github.com/MeckyWu/subspace-match |
Framework | pytorch |
On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach
Title | On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach |
Authors | Nikolai Smolyanskiy, Alexey Kamenev, Stan Birchfield |
Abstract | We revisit the problem of visual depth estimation in the context of autonomous vehicles. Despite the progress on monocular depth estimation in recent years, we show that the gap between monocular and stereo depth accuracy remains large$-$a particularly relevant result due to the prevalent reliance upon monocular cameras by vehicles that are expected to be self-driving. We argue that the challenges of removing this gap are significant, owing to fundamental limitations of monocular vision. As a result, we focus our efforts on depth estimation by stereo. We propose a novel semi-supervised learning approach to training a deep stereo neural network, along with a novel architecture containing a machine-learned argmax layer and a custom runtime (that will be shared publicly) that enables a smaller version of our stereo DNN to run on an embedded GPU. Competitive results are shown on the KITTI 2015 stereo dataset. We also evaluate the recent progress of stereo algorithms by measuring the impact upon accuracy of various design criteria. |
Tasks | Autonomous Vehicles, Depth Estimation, Stereo Depth Estimation |
Published | 2018-03-26 |
URL | http://arxiv.org/abs/1803.09719v3 |
http://arxiv.org/pdf/1803.09719v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-importance-of-stereo-for-accurate |
Repo | https://github.com/iitmcvg/redtail |
Framework | tf |
Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations
Title | Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations |
Authors | Vladimir Nekrasov, Thanuja Dharmasiri, Andrew Spek, Tom Drummond, Chunhua Shen, Ian Reid |
Abstract | Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards. Here, we address three of its most prominent hurdles, namely, i) the adaptation of a single model to perform multiple tasks at once (in this work, we consider depth estimation and semantic segmentation crucial for acquiring geometric and semantic understanding of the scene), while ii) doing it in real-time, and iii) using asymmetric datasets with uneven numbers of annotations per each modality. To overcome the first two issues, we adapt a recently proposed real-time semantic segmentation network, making changes to further reduce the number of floating point operations. To approach the third issue, we embrace a simple solution based on hard knowledge distillation under the assumption of having access to a powerful `teacher’ network. We showcase how our system can be easily extended to handle more tasks, and more datasets, all at once, performing depth estimation and segmentation both indoors and outdoors with a single model. Quantitatively, we achieve results equivalent to (or better than) current state-of-the-art approaches with one forward pass costing just 13ms and 6.5 GFLOPs on 640x480 inputs. This efficiency allows us to directly incorporate the raw predictions of our network into the SemanticFusion framework for dense 3D semantic reconstruction of the scene. | |
Tasks | Depth Estimation, Real-Time Semantic Segmentation, Semantic Segmentation, Surface Normals Estimation |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.04766v2 |
http://arxiv.org/pdf/1809.04766v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-joint-semantic-segmentation-and |
Repo | https://github.com/DrSleep/light-weight-refinenet |
Framework | pytorch |