January 29, 2020

3181 words 15 mins read

Paper Group ANR 543

InSituNet: Deep Image Synthesis for Parameter Space Exploration of Ensemble Simulations. One-Shot Weakly Supervised Video Object Segmentation. Efficient Video Semantic Segmentation with Labels Propagation and Refinement. Integer Programming for Learning Directed Acyclic Graphs from Continuous Data. Symmetric block-low-rank layers for fully reversib …

InSituNet: Deep Image Synthesis for Parameter Space Exploration of Ensemble Simulations


Title	InSituNet: Deep Image Synthesis for Parameter Space Exploration of Ensemble Simulations
Authors	Wenbin He, Junpeng Wang, Hanqi Guo, Ko-Chih Wang, Han-Wei Shen, Mukund Raj, Youssef S. G. Nashed, Tom Peterka
Abstract	We propose InSituNet, a deep learning based surrogate model to support parameter space exploration for ensemble simulations that are visualized in situ. In situ visualization, generating visualizations at simulation time, is becoming prevalent in handling large-scale simulations because of the I/O and storage constraints. However, in situ visualization approaches limit the flexibility of post-hoc exploration because the raw simulation data are no longer available. Although multiple image-based approaches have been proposed to mitigate this limitation, those approaches lack the ability to explore the simulation parameters. Our approach allows flexible exploration of parameter space for large-scale ensemble simulations by taking advantage of the recent advances in deep learning. Specifically, we design InSituNet as a convolutional regression model to learn the mapping from the simulation and visualization parameters to the visualization results. With the trained model, users can generate new images for different simulation parameters under various visualization settings, which enables in-depth analysis of the underlying ensemble simulations. We demonstrate the effectiveness of InSituNet in combustion, cosmology, and ocean simulations through quantitative and qualitative evaluations.
Tasks	Image Generation
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00407v3
PDF	https://arxiv.org/pdf/1908.00407v3.pdf
PWC	https://paperswithcode.com/paper/insitunet-deep-image-synthesis-for-parameter
Repo
Framework

One-Shot Weakly Supervised Video Object Segmentation


Title	One-Shot Weakly Supervised Video Object Segmentation
Authors	Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand
Abstract	Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks. Recent work has shown to perform on par with weaker levels of supervision in terms of scribbles and bounding boxes. However, there has been limited attention given to the problem of few-shot object segmentation with image-level supervision. We propose a novel multi-modal interaction module for few-shot object segmentation that utilizes a co-attention mechanism using both visual and word embeddings. It enables our model to achieve 5.1% improvement over previously proposed image-level few-shot object segmentation. Our method compares relatively close to the state of the art methods that use strong supervision, while ours use the least possible supervision. We further propose a novel setup for few-shot weakly supervised video object segmentation(VOS) that relies on image-level labels for the first frame. The proposed setup uses weak annotation unlike semi-supervised VOS setting that utilizes strongly labelled segmentation masks. The setup evaluates the effectiveness of generalizing to novel classes in the VOS setting. The setup splits the VOS data into multiple folds with different categories per fold. It provides a potential setup to evaluate how few-shot object segmentation methods can benefit from additional object poses, or object interactions that is not available in static frames as in PASCAL-5i benchmark.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation, Word Embeddings
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08936v1
PDF	https://arxiv.org/pdf/1912.08936v1.pdf
PWC	https://paperswithcode.com/paper/one-shot-weakly-supervised-video-object
Repo
Framework


Title	Efficient Video Semantic Segmentation with Labels Propagation and Refinement
Authors	Matthieu Paul, Christoph Mayer, Luc Van Gool, Radu Timofte
Abstract	This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach. We propose an Efficient Video Segmentation(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next. It runs in parallel with the GPU. (ii) On the GPU, two Convolutional Neural Networks: A main segmentation network that is used to predict dense semantic labels from scratch, and a Refiner that is designed to improve predictions from previous frames with the help of a fast Inconsistencies Attention Module (IAM). The latter can identify regions that cannot be propagated accurately. We suggest several operating points depending on the desired frame rate and accuracy. Our pipeline achieves accuracy levels competitive to the existing real-time methods for semantic image segmentation(mIoU above 60%), while achieving much higher frame rates. On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU.
Tasks	Optical Flow Estimation, Real-Time Semantic Segmentation, Semantic Segmentation, Video Semantic Segmentation
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11844v1
PDF	https://arxiv.org/pdf/1912.11844v1.pdf
PWC	https://paperswithcode.com/paper/efficient-video-semantic-segmentation-with
Repo
Framework

Integer Programming for Learning Directed Acyclic Graphs from Continuous Data


Title	Integer Programming for Learning Directed Acyclic Graphs from Continuous Data
Authors	Hasan Manzour, Simge Küçükyavuz, Ali Shojaie
Abstract	Learning directed acyclic graphs (DAGs) from data is a challenging task both in theory and in practice, because the number of possible DAGs scales superexponentially with the number of nodes. In this paper, we study the problem of learning an optimal DAG from continuous observational data. We cast this problem in the form of a mathematical programming model which can naturally incorporate a super-structure in order to reduce the set of possible candidate DAGs. We use the penalized negative log-likelihood score function with both $\ell_0$ and $\ell_1$ regularizations and propose a new mixed-integer quadratic optimization (MIQO) model, referred to as a layered network (LN) formulation. The LN formulation is a compact model, which enjoys as tight an optimal continuous relaxation value as the stronger but larger formulations under a mild condition. Computational results indicate that the proposed formulation outperforms existing mathematical formulations and scales better than available algorithms that can solve the same problem with only $\ell_1$ regularization. In particular, the LN formulation clearly outperforms existing methods in terms of computational time needed to find an optimal DAG in the presence of a sparse super-structure.
Tasks
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10574v1
PDF	http://arxiv.org/pdf/1904.10574v1.pdf
PWC	https://paperswithcode.com/paper/integer-programming-for-learning-directed
Repo
Framework

Symmetric block-low-rank layers for fully reversible multilevel neural networks


Title	Symmetric block-low-rank layers for fully reversible multilevel neural networks
Authors	Bas Peters, Eldad Haber, Keegan Lensink
Abstract	Factors that limit the size of the input and output of a neural network include memory requirements for the network states/activations to compute gradients, as well as memory for the convolutional kernels or other weights. The memory restriction is especially limiting for applications where we want to learn how to map volumetric data to the desired output, such as video-to-video. Recently developed fully reversible neural networks enable gradient computations using storage of the network states for a couple of layers only. While this saves a tremendous amount of memory, it is the convolutional kernels that take up most memory if fully reversible networks contain multiple invertible pooling/coarsening layers. Invertible coarsening operators such as the orthogonal wavelet transform cause the number of channels to grow explosively. We address this issue by combining fully reversible networks with layers that contain the convolutional kernels in a compressed form directly. Specifically, we introduce a layer that has a symmetric block-low-rank structure. In spirit, this layer is similar to bottleneck and squeeze-and-expand structures. We contribute symmetry by construction, and a combination of notation and flattening of tensors allows us to interpret these network structures in linear algebraic fashion as a block-low-rank matrix in factorized form and observe various properties. A video segmentation example shows that we can train a network to segment the entire video in one go, which would not be possible, in terms of memory requirements, using non-reversible networks and previously proposed reversible networks.
Tasks	Video Semantic Segmentation
Published	2019-12-14
URL	https://arxiv.org/abs/1912.12137v1
PDF	https://arxiv.org/pdf/1912.12137v1.pdf
PWC	https://paperswithcode.com/paper/symmetric-block-low-rank-layers-for-fully
Repo
Framework

Sequential image processing methods for improving semantic video segmentation algorithms


Title	Sequential image processing methods for improving semantic video segmentation algorithms
Authors	Beril Sirmacek, Nicolò Botteghi, Santiago Sanchez Escalonilla Plaza
Abstract	Recently, semantic video segmentation gained high attention especially for supporting autonomous driving systems. Deep learning methods made it possible to implement real time segmentation and object identification algorithms on videos. However, most of the available approaches process each video frame independently disregarding their sequential relation in time. Therefore their results suddenly miss some of the object segments in some of the frames even if they were detected properly in the earlier frames. Herein we propose two sequential probabilistic video frame analysis approaches to improve the segmentation performance of the existing algorithms. Our experiments show that using the information of the past frames we increase the performance and consistency of the state of the art algorithms.
Tasks	Autonomous Driving, Video Semantic Segmentation
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13348v1
PDF	https://arxiv.org/pdf/1910.13348v1.pdf
PWC	https://paperswithcode.com/paper/191013348
Repo
Framework

Optimal Algorithms for Submodular Maximization with Distributed Constraints


Title	Optimal Algorithms for Submodular Maximization with Distributed Constraints
Authors	Alexander Robey, Arman Adibi, Brent Schlotfeldt, George J. Pappas, Hamed Hassani
Abstract	We consider a class of discrete optimization problems that aim to maximize a submodular objective function subject to a distributed partition matroid constraint. More precisely, we consider a networked scenario in which multiple agents choose actions from local strategy sets with the goal of maximizing a submodular objective function defined over the set of all possible actions. Given this distributed setting, we develop Constraint-Distributed Continuous Greedy (CDCG), a message passing algorithm that converges to the tight $(1-1/e)$ approximation factor of the optimum global solution using only local computation and communication. It is known that a sequential greedy algorithm can only achieve a $1/2$ multiplicative approximation of the optimal solution for this class of problems in the distributed setting. Our framework relies on lifting the discrete problem to a continuous domain and developing a consensus algorithm that achieves the tight $(1-1/e)$ approximation guarantee of the global discrete solution once a proper rounding scheme is applied. We also offer empirical results from a multi-agent area coverage problem to show that the proposed method significantly outperforms the state-of-the-art sequential greedy method.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13676v2
PDF	https://arxiv.org/pdf/1909.13676v2.pdf
PWC	https://paperswithcode.com/paper/optimal-algorithms-for-submodular
Repo
Framework

Object Segmentation Tracking from Generic Video Cues


Title	Object Segmentation Tracking from Generic Video Cues
Authors	Amirhossein Kardoost, Sabine Müller, Joachim Weickert, Margret Keuper
Abstract	We propose a light-weight variational framework for online tracking of object segmentations in videos based on optical flow and image boundaries. While high-end computer vision methods on this task rely on sequence specific training of dedicated CNN architectures, we show the potential of a variational model, based on generic video information from motion and color. Such cues are usually required for tasks such as robot navigation or grasp estimation. We leverage them directly for video object segmentation and thus provide accurate segmentations at potentially very low extra cost. Furthermore, we show that our approach can be combined with state-of-the-art CNN-based segmentations in order to improve over their respective results. We evaluate our method on the datasets DAVIS16,17 and SegTrack v2.
Tasks	Optical Flow Estimation, Robot Navigation, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2019-10-05
URL	https://arxiv.org/abs/1910.02258v1
PDF	https://arxiv.org/pdf/1910.02258v1.pdf
PWC	https://paperswithcode.com/paper/object-segmentation-tracking-from-generic
Repo
Framework

FlexNER: A Flexible LSTM-CNN Stack Framework for Named Entity Recognition


Title	FlexNER: A Flexible LSTM-CNN Stack Framework for Named Entity Recognition
Authors	Hongyin Zhu, Wenpeng Hu, Yi Zeng
Abstract	Named entity recognition (NER) is a foundational technology for information extraction. This paper presents a flexible NER framework compatible with different languages and domains. Inspired by the idea of distant supervision (DS), this paper enhances the representation by increasing the entity-context diversity without relying on external resources. We choose different layer stacks and sub-network combinations to construct the bilateral networks. This strategy can generally improve model performance on different datasets. We conduct experiments on five languages, such as English, German, Spanish, Dutch and Chinese, and biomedical fields, such as identifying the chemicals and gene/protein terms from scientific works. Experimental results demonstrate the good performance of this framework.
Tasks	Named Entity Recognition
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05009v1
PDF	https://arxiv.org/pdf/1908.05009v1.pdf
PWC	https://paperswithcode.com/paper/flexner-a-flexible-lstm-cnn-stack-framework
Repo
Framework

Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels


Title	Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels
Authors	Murad Qasaimeh, Kristof Denolf, Jack Lo, Kees Vissers, Joseph Zambreno, Phillip H. Jones
Abstract	Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1-3.2x compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2-22.3x. It is also observed that the FPGA performs increasingly better as a vision application’s pipeline complexity grows.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1906.11879v1
PDF	https://arxiv.org/pdf/1906.11879v1.pdf
PWC	https://paperswithcode.com/paper/comparing-energy-efficiency-of-cpu-gpu-and
Repo
Framework

A Preliminary Study on Data Augmentation of Deep Learning for Image Classification


Title	A Preliminary Study on Data Augmentation of Deep Learning for Image Classification
Authors	Benlin Hu, Cheng Lei, Dong Wang, Shu Zhang, Zhenyu Chen
Abstract	Deep learning models have a large number of freeparameters that need to be calculated by effective trainingof the models on a great deal of training data to improvetheir generalization performance. However, data obtaining andlabeling is expensive in practice. Data augmentation is one of themethods to alleviate this problem. In this paper, we conduct apreliminary study on how three variables (augmentation method,augmentation rate and size of basic dataset per label) can affectthe accuracy of deep learning for image classification. The studyprovides some guidelines: (1) it is better to use transformationsthat alter the geometry of the images rather than those justlighting and color. (2) 2-3 times augmentation rate is good enoughfor training. (3) the smaller amount of data, the more obviouscontributions could have.
Tasks	Data Augmentation, Image Classification
Published	2019-06-09
URL	https://arxiv.org/abs/1906.11887v1
PDF	https://arxiv.org/pdf/1906.11887v1.pdf
PWC	https://paperswithcode.com/paper/a-preliminary-study-on-data-augmentation-of
Repo
Framework

Aggregate-Eliminate-Predict: Detecting Adverse Drug Events from Heterogeneous Electronic Health Records


Title	Aggregate-Eliminate-Predict: Detecting Adverse Drug Events from Heterogeneous Electronic Health Records
Authors	Maria Bampa, Panagiotis Papapetrou
Abstract	We study the problem of detecting adverse drug events in electronic healthcare records. The challenge in this work is to aggregate heterogeneous data types involving diagnosis codes, drug codes, as well as lab measurements. An earlier framework proposed for the same problem demonstrated promising predictive performance for the random forest classifier by using only lab measurements as data features. We extend this framework, by additionally including diagnosis and drug prescription codes, concurrently. In addition, we employ a recursive feature selection mechanism on top, that extracts the top-k most important features. Our experimental evaluation on five medical datasets of adverse drug events and six different classifiers, suggests that the integration of these additional features provides substantial and statistically significant improvements in terms of AUC, while employing medically relevant features.
Tasks	Feature Selection
Published	2019-07-13
URL	https://arxiv.org/abs/1907.06058v1
PDF	https://arxiv.org/pdf/1907.06058v1.pdf
PWC	https://paperswithcode.com/paper/aggregate-eliminate-predict-detecting-adverse
Repo
Framework

Persuading Voters: It’s Easy to Whisper, It’s Hard to Speak Loud


Title	Persuading Voters: It’s Easy to Whisper, It’s Hard to Speak Loud
Authors	Matteo Castiglioni, Andrea Celli, Nicola Gatti
Abstract	We focus on the following natural question: is it possible to influence the outcome of a voting process through the strategic provision of information to voters who update their beliefs rationally? We investigate whether it is computationally tractable to design a signaling scheme maximizing the probability with which the sender’s preferred candidate is elected. We focus on the model recently introduced by Arieli and Babichenko (2019) (i.e., without inter-agent externalities), and consider, as explanatory examples, $k$-voting rule and plurality voting. There is a sharp contrast between the case in which private signals are allowed and the more restrictive setting in which only public signals are allowed. In the former, we show that an optimal signaling scheme can be computed efficiently both under a $k$-voting rule and plurality voting. In establishing these results, we provide two general (i.e., applicable to settings beyond voting) contributions. Specifically, we extend a well known result by Dughmi and Xu (2017) to more general settings, and prove that, when the sender’s utility function is anonymous, computing an optimal signaling scheme is fixed parameter tractable w.r.t. the number of receivers’ actions. In the public signaling case, we show that the sender’s optimal expected return cannot be approximated to within any factor under a $k$-voting rule. This negative result easily extends to plurality voting and problems where utility functions are anonymous.
Tasks
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10620v2
PDF	https://arxiv.org/pdf/1908.10620v2.pdf
PWC	https://paperswithcode.com/paper/persuading-voters-its-easy-to-whisper-its
Repo
Framework

Grasp Type Estimation for Myoelectric Prostheses using Point Cloud Feature Learning


Title	Grasp Type Estimation for Myoelectric Prostheses using Point Cloud Feature Learning
Authors	Ghazal Ghazaei, Federico Tombari, Nassir Navab, Kianoush Nazarpour
Abstract	Prosthetic hands can help people with limb difference to return to their life routines. Commercial prostheses, however have several limitations in providing an acceptable dexterity. We approach these limitations by augmenting the prosthetic hands with an off-the-shelf depth sensor to enable the prosthesis to see the object’s depth, record a single view (2.5-D) snapshot, and estimate an appropriate grasp type; using a deep network architecture based on 3D point clouds called PointNet. The human can act as the supervisor throughout the procedure by accepting or refusing the suggested grasp type. We achieved the grasp classification accuracy of up to 88%. Contrary to the case of the RGB data, the depth data provides all the necessary object shape information, which is required for grasp recognition. The PointNet not only enables using 3-D data in practice, but it also prevents excessive computations. Augmentation of the prosthetic hands with such a semi-autonomous system can lead to better differentiation of grasp types, less burden on user, and better performance.
Tasks
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02564v1
PDF	https://arxiv.org/pdf/1908.02564v1.pdf
PWC	https://paperswithcode.com/paper/grasp-type-estimation-for-myoelectric
Repo
Framework

Extracting local switching fields in permanent magnets using machine learning


Title	Extracting local switching fields in permanent magnets using machine learning
Authors	Markus Gusenbauer, Harald Oezelt, Johann Fischbacher, Alexander Kovacs, Panpan Zhao, Thomas George Woodcock, Thomas Schrefl
Abstract	Microstructural features play an important role for the quality of permanent magnets. The coercivity is greatly influenced by crystallographic defects, which is well known for MnAl-C, for example. In this work we show a direct link of microstructural features to the local coercivity of MnAl-C grains by machine learning. A large number of micromagnetic simulations is performed directly from Electron Backscatter Diffraction (EBSD) data using an automated meshing, modeling and simulation procedure. Decision trees are trained with the simulation results and predict local switching fields from new microscopic data within seconds.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09279v3
PDF	https://arxiv.org/pdf/1910.09279v3.pdf
PWC	https://paperswithcode.com/paper/extracting-local-switching-fields-in
Repo
Framework