Paper Group ANR 634
Constraint-Based Visual Generation. Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking. Crowdsourcing Lung Nodules Detection and Annotation. Depth-Adaptive Computational Policies for Efficient Visual Tracking. Best arm identification in multi-armed bandits wi …
Constraint-Based Visual Generation
Title | Constraint-Based Visual Generation |
Authors | Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Marco Gori |
Abstract | In the last few years the systematic adoption of deep learning to visual generation has produced impressive results that, amongst others, definitely benefit from the massive exploration of convolutional architectures. In this paper, we propose a general approach to visual generation that combines learning capabilities with logic descriptions of the target to be generated. The process of generation is regarded as a constrained satisfaction problem, where the constraints describe a set of properties that characterize the target. Interestingly, the constraints can also involve logic variables, while all of them are converted into real-valued functions by means of the t-norm theory. We use deep architectures to model the involved variables, and propose a computational scheme where the learning process carries out a satisfaction of the constraints. We propose some examples in which the theory can naturally be used, including the modeling of GAN and auto-encoders, and report promising results in problems with the generation of handwritten characters and face transformations. |
Tasks | |
Published | 2018-07-16 |
URL | https://arxiv.org/abs/1807.09202v3 |
https://arxiv.org/pdf/1807.09202v3.pdf | |
PWC | https://paperswithcode.com/paper/constraint-based-visual-generation |
Repo | |
Framework | |
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking
Title | Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking |
Authors | Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler |
Abstract | With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches. |
Tasks | Feature Selection, Object Tracking, Video Object Tracking, Visual Tracking |
Published | 2018-07-30 |
URL | https://arxiv.org/abs/1807.11348v3 |
https://arxiv.org/pdf/1807.11348v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-adaptive-discriminative-correlation |
Repo | |
Framework | |
Crowdsourcing Lung Nodules Detection and Annotation
Title | Crowdsourcing Lung Nodules Detection and Annotation |
Authors | Saeed Boorboor, Saad Nadeem, Ji Hwan Park, Kevin Baker, Arie Kaufman |
Abstract | We present crowdsourcing as an additional modality to aid radiologists in the diagnosis of lung cancer from clinical chest computed tomography (CT) scans. More specifically, a complete workflow is introduced which can help maximize the sensitivity of lung nodule detection by utilizing the collective intelligence of the crowd. We combine the concept of overlapping thin-slab maximum intensity projections (TS-MIPs) and cine viewing to render short videos that can be outsourced as an annotation task to the crowd. These videos are generated by linearly interpolating overlapping TS-MIPs of CT slices through the depth of each quadrant of a patient’s lung. The resultant videos are outsourced to an online community of non-expert users who, after a brief tutorial, annotate suspected nodules in these video segments. Using our crowdsourcing workflow, we achieved a lung nodule detection sensitivity of over 90% for 20 patient CT datasets (containing 178 lung nodules with sizes between 1-30mm), and only 47 false positives from a total of 1021 annotations on nodules of all sizes (96% sensitivity for nodules$>$4mm). These results show that crowdsourcing can be a robust and scalable modality to aid radiologists in screening for lung cancer, directly or in combination with computer-aided detection (CAD) algorithms. For CAD algorithms, the presented workflow can provide highly accurate training data to overcome the high false-positive rate (per scan) problem. We also provide, for the first time, analysis on nodule size and position which can help improve CAD algorithms. |
Tasks | Computed Tomography (CT), Lung Nodule Detection |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06402v1 |
http://arxiv.org/pdf/1809.06402v1.pdf | |
PWC | https://paperswithcode.com/paper/crowdsourcing-lung-nodules-detection-and |
Repo | |
Framework | |
Depth-Adaptive Computational Policies for Efficient Visual Tracking
Title | Depth-Adaptive Computational Policies for Efficient Visual Tracking |
Authors | Chris Ying, Katerina Fragkiadaki |
Abstract | Current convolutional neural networks algorithms for video object tracking spend the same amount of computation for each object and video frame. However, it is harder to track an object in some frames than others, due to the varying amount of clutter, scene complexity, amount of motion, and object’s distinctiveness against its background. We propose a depth-adaptive convolutional Siamese network that performs video tracking adaptively at multiple neural network depths. Parametric gating functions are trained to control the depth of the convolutional feature extractor by minimizing a joint loss of computational cost and tracking error. Our network achieves accuracy comparable to the state-of-the-art on the VOT2016 benchmark. Furthermore, our adaptive depth computation achieves higher accuracy for a given computational cost than traditional fixed-structure neural networks. The presented framework extends to other tasks that use convolutional neural networks and enables trading speed for accuracy at runtime. |
Tasks | Object Tracking, Video Object Tracking, Visual Tracking |
Published | 2018-01-01 |
URL | http://arxiv.org/abs/1801.00508v1 |
http://arxiv.org/pdf/1801.00508v1.pdf | |
PWC | https://paperswithcode.com/paper/depth-adaptive-computational-policies-for |
Repo | |
Framework | |
Best arm identification in multi-armed bandits with delayed feedback
Title | Best arm identification in multi-armed bandits with delayed feedback |
Authors | Aditya Grover, Todor Markov, Peter Attia, Norman Jin, Nicholas Perkins, Bryan Cheong, Michael Chen, Zi Yang, Stephen Harris, William Chueh, Stefano Ermon |
Abstract | We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework to model the relationship between partial and delayed feedback, and as a special case we introduce efficient algorithms for settings where the partial feedback are biased or unbiased estimators of the delayed feedback. Additionally, we propose a novel extension of the algorithms to the parallel MAB setting where an agent can control a batch of arms. Our experiments in real-world settings, involving policy search and hyperparameter optimization in computational sustainability domains for fast charging of batteries and wildlife corridor construction, demonstrate that exploiting the structure of partial feedback can lead to significant improvements over baselines in both sequential and parallel MAB. |
Tasks | Hyperparameter Optimization, Multi-Armed Bandits |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.10937v1 |
http://arxiv.org/pdf/1803.10937v1.pdf | |
PWC | https://paperswithcode.com/paper/best-arm-identification-in-multi-armed |
Repo | |
Framework | |
Automatic Lung Cancer Prediction from Chest X-ray Images Using Deep Learning Approach
Title | Automatic Lung Cancer Prediction from Chest X-ray Images Using Deep Learning Approach |
Authors | Worawate Ausawalaithong, Sanparith Marukatat, Arjaree Thirach, Theerawit Wilaiprasitporn |
Abstract | Since, cancer is curable when diagnosed at an early stage, lung cancer screening plays an important role in preventive care. Although both low dose computed tomography (LDCT) and computed tomography (CT) scans provide more medical information than normal chest x-rays, there is very limited access to these technologies in rural areas. Recently, there is a trend in using computer-aided diagnosis (CADx) to assist in screening and diagnosing of cancer from biomedical images. In this study, the 121-layer convolutional neural network also known as DenseNet-121 by G. Huang et. al., along with the transfer learning scheme was explored as a means to classify lung cancer using chest X-ray images. The model was trained on a lung nodules dataset before training on the lung cancer dataset to alleviate the problem of a small dataset. The proposed model yields 74.43$\pm$6.01% of mean accuracy, 74.96$\pm$9.85% of mean specificity, and 74.68$\pm$15.33% of mean sensitivity. The proposed model also provides a heatmap for identifying the location of the lung nodule. These findings are promising for further development of chest x-ray-based lung cancer diagnosis using the deep learning approach. Moreover, these findings solve the problem of small dataset. |
Tasks | Computed Tomography (CT), Lung Cancer Diagnosis, Transfer Learning |
Published | 2018-08-31 |
URL | http://arxiv.org/abs/1808.10858v1 |
http://arxiv.org/pdf/1808.10858v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-lung-cancer-prediction-from-chest-x |
Repo | |
Framework | |
Recurrent-OctoMap: Learning State-based Map Refinement for Long-Term Semantic Mapping with 3D-Lidar Data
Title | Recurrent-OctoMap: Learning State-based Map Refinement for Long-Term Semantic Mapping with 3D-Lidar Data |
Authors | Li Sun, Zhi Yan, Anestis Zaganidis, Cheng Zhao, Tom Duckett |
Abstract | This paper presents a novel semantic mapping approach, Recurrent-OctoMap, learned from long-term 3D Lidar data. Most existing semantic mapping approaches focus on improving semantic understanding of single frames, rather than 3D refinement of semantic maps (i.e. fusing semantic observations). The most widely-used approach for 3D semantic map refinement is a Bayesian update, which fuses the consecutive predictive probabilities following a Markov-Chain model. Instead, we propose a learning approach to fuse the semantic features, rather than simply fusing predictions from a classifier. In our approach, we represent and maintain our 3D map as an OctoMap, and model each cell as a recurrent neural network (RNN), to obtain a Recurrent-OctoMap. In this case, the semantic mapping process can be formulated as a sequence-to-sequence encoding-decoding problem. Moreover, in order to extend the duration of observations in our Recurrent-OctoMap, we developed a robust 3D localization and mapping system for successively mapping a dynamic environment using more than two weeks of data, and the system can be trained and deployed with arbitrary memory length. We validate our approach on the ETH long-term 3D Lidar dataset [1]. The experimental results show that our proposed approach outperforms the conventional “Bayesian update” approach. |
Tasks | |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00925v2 |
http://arxiv.org/pdf/1807.00925v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-octomap-learning-state-based-map |
Repo | |
Framework | |
Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge
Title | Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge |
Authors | Todor Mihaylov, Anette Frank |
Abstract | We introduce a neural reading comprehension model that integrates external commonsense knowledge, encoded as a key-value memory, in a cloze-style setting. Instead of relying only on document-to-question interaction or discrete features as in prior work, our model attends to relevant external knowledge and combines this knowledge with the context representation before inferring the answer. This allows the model to attract and imply knowledge from an external knowledge source that is not explicitly stated in the text, but that is relevant for inferring the answer. Our model improves results over a very strong baseline on a hard Common Nouns dataset, making it a strong competitor of much more complex models. By including knowledge explicitly, our model can also provide evidence about the background knowledge used in the RC process. |
Tasks | Reading Comprehension |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.07858v1 |
http://arxiv.org/pdf/1805.07858v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledgeable-reader-enhancing-cloze-style |
Repo | |
Framework | |
Sequence Generation with Guider Network
Title | Sequence Generation with Guider Network |
Authors | Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Liqun Chen, Dinghan Shen, Guoyin Wang, Lawrence Carin |
Abstract | Sequence generation with reinforcement learning (RL) has received significant attention recently. However, a challenge with such methods is the sparse-reward problem in the RL training process, in which a scalar guiding signal is often only available after an entire sequence has been generated. This type of sparse reward tends to ignore the global structural information of a sequence, causing generation of sequences that are semantically inconsistent. In this paper, we present a model-based RL approach to overcome this issue. Specifically, we propose a novel guider network to model the sequence-generation environment, which can assist next-word prediction and provide intermediate rewards for generator optimization. Extensive experiments show that the proposed method leads to improved performance for both unconditional and conditional sequence-generation tasks. |
Tasks | |
Published | 2018-11-02 |
URL | http://arxiv.org/abs/1811.00696v1 |
http://arxiv.org/pdf/1811.00696v1.pdf | |
PWC | https://paperswithcode.com/paper/sequence-generation-with-guider-network |
Repo | |
Framework | |
Distributed Newton Methods for Deep Neural Networks
Title | Distributed Newton Methods for Deep Neural Networks |
Authors | Chien-Chih Wang, Kent Loong Tan, Chun-Ting Chen, Yu-Hsiang Lin, S. Sathiya Keerthi, Dhruv Mahajan, S. Sundararajan, Chih-Jen Lin |
Abstract | Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this paper, we focus on situations where the model is distributedly stored, and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions, and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as the memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. In compared with stochastic gradient methods, it is more robust and may give better test accuracy. |
Tasks | |
Published | 2018-02-01 |
URL | http://arxiv.org/abs/1802.00130v1 |
http://arxiv.org/pdf/1802.00130v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-newton-methods-for-deep-neural |
Repo | |
Framework | |
DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation
Title | DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation |
Authors | Zuxuan Wu, Xintong Han, Yen-Liang Lin, Mustafa Gkhan Uzunbas, Tom Goldstein, Ser Nam Lim, Larry S. Davis |
Abstract | Harvesting dense pixel-level annotations to train deep neural networks for semantic segmentation is extremely expensive and unwieldy at scale. While learning from synthetic data where labels are readily available sounds promising, performance degrades significantly when testing on novel realistic data due to domain discrepancies. We present Dual Channel-wise Alignment Networks (DCAN), a simple yet effective approach to reduce domain shift at both pixel-level and feature-level. Exploring statistics in each channel of CNN feature maps, our framework performs channel-wise feature alignment, which preserves spatial structures and semantic information, in both an image generator and a segmentation network. In particular, given an image from the source domain and unlabeled samples from the target domain, the generator synthesizes new images on-the-fly to resemble samples from the target domain in appearance and the segmentation network further refines high-level features before predicting semantic maps, both of which leverage feature statistics of sampled images from the target domain. Unlike much recent and concurrent work relying on adversarial training, our framework is lightweight and easy to train. Extensive experiments on adapting models trained on synthetic segmentation benchmarks to real urban scenes demonstrate the effectiveness of the proposed framework. |
Tasks | Semantic Segmentation |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05827v1 |
http://arxiv.org/pdf/1804.05827v1.pdf | |
PWC | https://paperswithcode.com/paper/dcan-dual-channel-wise-alignment-networks-for |
Repo | |
Framework | |
Detecting Comma-shaped Clouds for Severe Weather Forecasting using Shape and Motion
Title | Detecting Comma-shaped Clouds for Severe Weather Forecasting using Shape and Motion |
Authors | Xinye Zheng, Jianbo Ye, Yukun Chen, Stephen Wistar, Jia Li, Jose A. Piedra-Fernández, Michael A. Steinberg, James Z. Wang |
Abstract | Meteorologists use shapes and movements of clouds in satellite images as indicators of several major types of severe storms. Satellite imaginary data are in increasingly higher resolution, both spatially and temporally, making it impossible for humans to fully leverage the data in their forecast. Automatic satellite imagery analysis methods that can find storm-related cloud patterns as soon as they are detectable are in demand. We propose a machine learning and pattern recognition based approach to detect “comma-shaped” clouds in satellite images, which are specific cloud distribution patterns strongly associated with the cyclone formulation. In order to detect regions with the targeted movement patterns, our method is trained on manually annotated cloud examples represented by both shape and motion-sensitive features. Sliding windows in different scales are used to ensure that dense clouds will be captured, and we implement effective selection rules to shrink the region of interest among these sliding windows. Finally, we evaluate the method on a hold-out annotated comma-shaped cloud dataset and cross-match the results with recorded storm events in the severe weather database. The validated utility and accuracy of our method suggest a high potential for assisting meteorologists in weather forecasting. |
Tasks | Weather Forecasting |
Published | 2018-02-25 |
URL | http://arxiv.org/abs/1802.08937v3 |
http://arxiv.org/pdf/1802.08937v3.pdf | |
PWC | https://paperswithcode.com/paper/detecting-comma-shaped-clouds-for-severe |
Repo | |
Framework | |
Rank-1 Convolutional Neural Network
Title | Rank-1 Convolutional Neural Network |
Authors | Hyein Kim, Jungho Yoon, Byeongseon Jeong, Sukho Lee |
Abstract | In this paper, we propose a convolutional neural network(CNN) with 3-D rank-1 filters which are composed by the outer product of 1-D filters. After being trained, the 3-D rank-1 filters can be decomposed into 1-D filters in the test time for fast inference. The reason that we train 3-D rank-1 filters in the training stage instead of consecutive 1-D filters is that a better gradient flow can be obtained with this setting, which makes the training possible even in the case where the network with consecutive 1-D filters cannot be trained. The 3-D rank-1 filters are updated by both the gradient flow and the outer product of the 1-D filters in every epoch, where the gradient flow tries to obtain a solution which minimizes the loss function, while the outer product operation tries to make the parameters of the filter to live on a rank-1 sub-space. Furthermore, we show that the convolution with the rank-1 filters results in low rank outputs, constraining the final output of the CNN also to live on a low dimensional subspace. |
Tasks | |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04303v1 |
http://arxiv.org/pdf/1808.04303v1.pdf | |
PWC | https://paperswithcode.com/paper/rank-1-convolutional-neural-network |
Repo | |
Framework | |
The Many Moods of Emotion
Title | The Many Moods of Emotion |
Authors | Valentin Vielzeuf, Corentin Kervadec, Stéphane Pateux, Frédéric Jurie |
Abstract | This paper presents a novel approach to the facial expression generation problem. Building upon the assumption of the psychological community that emotion is intrinsically continuous, we first design our own continuous emotion representation with a 3-dimensional latent space issued from a neural network trained on discrete emotion classification. The so-obtained representation can be used to annotate large in the wild datasets and later used to trained a Generative Adversarial Network. We first show that our model is able to map back to discrete emotion classes with a objectively and subjectively better quality of the images than usual discrete approaches. But also that we are able to pave the larger space of possible facial expressions, generating the many moods of emotion. Moreover, two axis in this space may be found to generate similar expression changes as in traditional continuous representations such as arousal-valence. Finally we show from visual interpretation, that the third remaining dimension is highly related to the well-known dominance dimension from psychology. |
Tasks | Emotion Classification |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13197v1 |
http://arxiv.org/pdf/1810.13197v1.pdf | |
PWC | https://paperswithcode.com/paper/the-many-moods-of-emotion |
Repo | |
Framework | |
Fully automatic structure from motion with a spline-based environment representation
Title | Fully automatic structure from motion with a spline-based environment representation |
Authors | Zhirui Wang, Laurent Kneip |
Abstract | While the common environment representation in structure from motion is given by a sparse point cloud, the community has also investigated the use of lines to better enforce the inherent regularities in man-made surroundings. Following the potential of this idea, the present paper introduces a more flexible higher-order extension of points that provides a general model for structural edges in the environment, no matter if straight or curved. Our model relies on linked B'ezier curves, the geometric intuition of which proves great benefits during parameter initialization and regularization. We present the first fully automatic pipeline that is able to generate spline-based representations without any human supervision. Besides a full graphical formulation of the problem, we introduce both geometric and photometric cues as well as higher-level concepts such overall curve visibility and viewing angle restrictions to automatically manage the correspondences in the graph. Results prove that curve-based structure from motion with splines is able to outperform state-of-the-art sparse feature-based methods, as well as to model curved edges in the environment. |
Tasks | |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12532v1 |
http://arxiv.org/pdf/1810.12532v1.pdf | |
PWC | https://paperswithcode.com/paper/fully-automatic-structure-from-motion-with-a |
Repo | |
Framework | |