Paper Group ANR 543
InSituNet: Deep Image Synthesis for Parameter Space Exploration of Ensemble Simulations. One-Shot Weakly Supervised Video Object Segmentation. Efficient Video Semantic Segmentation with Labels Propagation and Refinement. Integer Programming for Learning Directed Acyclic Graphs from Continuous Data. Symmetric block-low-rank layers for fully reversib …
InSituNet: Deep Image Synthesis for Parameter Space Exploration of Ensemble Simulations
Title | InSituNet: Deep Image Synthesis for Parameter Space Exploration of Ensemble Simulations |
Authors | Wenbin He, Junpeng Wang, Hanqi Guo, Ko-Chih Wang, Han-Wei Shen, Mukund Raj, Youssef S. G. Nashed, Tom Peterka |
Abstract | We propose InSituNet, a deep learning based surrogate model to support parameter space exploration for ensemble simulations that are visualized in situ. In situ visualization, generating visualizations at simulation time, is becoming prevalent in handling large-scale simulations because of the I/O and storage constraints. However, in situ visualization approaches limit the flexibility of post-hoc exploration because the raw simulation data are no longer available. Although multiple image-based approaches have been proposed to mitigate this limitation, those approaches lack the ability to explore the simulation parameters. Our approach allows flexible exploration of parameter space for large-scale ensemble simulations by taking advantage of the recent advances in deep learning. Specifically, we design InSituNet as a convolutional regression model to learn the mapping from the simulation and visualization parameters to the visualization results. With the trained model, users can generate new images for different simulation parameters under various visualization settings, which enables in-depth analysis of the underlying ensemble simulations. We demonstrate the effectiveness of InSituNet in combustion, cosmology, and ocean simulations through quantitative and qualitative evaluations. |
Tasks | Image Generation |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00407v3 |
https://arxiv.org/pdf/1908.00407v3.pdf | |
PWC | https://paperswithcode.com/paper/insitunet-deep-image-synthesis-for-parameter |
Repo | |
Framework | |
One-Shot Weakly Supervised Video Object Segmentation
Title | One-Shot Weakly Supervised Video Object Segmentation |
Authors | Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand |
Abstract | Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks. Recent work has shown to perform on par with weaker levels of supervision in terms of scribbles and bounding boxes. However, there has been limited attention given to the problem of few-shot object segmentation with image-level supervision. We propose a novel multi-modal interaction module for few-shot object segmentation that utilizes a co-attention mechanism using both visual and word embeddings. It enables our model to achieve 5.1% improvement over previously proposed image-level few-shot object segmentation. Our method compares relatively close to the state of the art methods that use strong supervision, while ours use the least possible supervision. We further propose a novel setup for few-shot weakly supervised video object segmentation(VOS) that relies on image-level labels for the first frame. The proposed setup uses weak annotation unlike semi-supervised VOS setting that utilizes strongly labelled segmentation masks. The setup evaluates the effectiveness of generalizing to novel classes in the VOS setting. The setup splits the VOS data into multiple folds with different categories per fold. It provides a potential setup to evaluate how few-shot object segmentation methods can benefit from additional object poses, or object interactions that is not available in static frames as in PASCAL-5i benchmark. |
Tasks | Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation, Word Embeddings |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08936v1 |
https://arxiv.org/pdf/1912.08936v1.pdf | |
PWC | https://paperswithcode.com/paper/one-shot-weakly-supervised-video-object |
Repo | |
Framework | |
Efficient Video Semantic Segmentation with Labels Propagation and Refinement
Title | Efficient Video Semantic Segmentation with Labels Propagation and Refinement |
Authors | Matthieu Paul, Christoph Mayer, Luc Van Gool, Radu Timofte |
Abstract | This paper tackles the problem of real-time semantic segmentation of high definition videos using a hybrid GPU / CPU approach. We propose an Efficient Video Segmentation(EVS) pipeline that combines: (i) On the CPU, a very fast optical flow method, that is used to exploit the temporal aspect of the video and propagate semantic information from one frame to the next. It runs in parallel with the GPU. (ii) On the GPU, two Convolutional Neural Networks: A main segmentation network that is used to predict dense semantic labels from scratch, and a Refiner that is designed to improve predictions from previous frames with the help of a fast Inconsistencies Attention Module (IAM). The latter can identify regions that cannot be propagated accurately. We suggest several operating points depending on the desired frame rate and accuracy. Our pipeline achieves accuracy levels competitive to the existing real-time methods for semantic image segmentation(mIoU above 60%), while achieving much higher frame rates. On the popular Cityscapes dataset with high resolution frames (2048 x 1024), the proposed operating points range from 80 to 1000 Hz on a single GPU and CPU. |
Tasks | Optical Flow Estimation, Real-Time Semantic Segmentation, Semantic Segmentation, Video Semantic Segmentation |
Published | 2019-12-26 |
URL | https://arxiv.org/abs/1912.11844v1 |
https://arxiv.org/pdf/1912.11844v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-video-semantic-segmentation-with |
Repo | |
Framework | |
Integer Programming for Learning Directed Acyclic Graphs from Continuous Data
Title | Integer Programming for Learning Directed Acyclic Graphs from Continuous Data |
Authors | Hasan Manzour, Simge Küçükyavuz, Ali Shojaie |
Abstract | Learning directed acyclic graphs (DAGs) from data is a challenging task both in theory and in practice, because the number of possible DAGs scales superexponentially with the number of nodes. In this paper, we study the problem of learning an optimal DAG from continuous observational data. We cast this problem in the form of a mathematical programming model which can naturally incorporate a super-structure in order to reduce the set of possible candidate DAGs. We use the penalized negative log-likelihood score function with both $\ell_0$ and $\ell_1$ regularizations and propose a new mixed-integer quadratic optimization (MIQO) model, referred to as a layered network (LN) formulation. The LN formulation is a compact model, which enjoys as tight an optimal continuous relaxation value as the stronger but larger formulations under a mild condition. Computational results indicate that the proposed formulation outperforms existing mathematical formulations and scales better than available algorithms that can solve the same problem with only $\ell_1$ regularization. In particular, the LN formulation clearly outperforms existing methods in terms of computational time needed to find an optimal DAG in the presence of a sparse super-structure. |
Tasks | |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.10574v1 |
http://arxiv.org/pdf/1904.10574v1.pdf | |
PWC | https://paperswithcode.com/paper/integer-programming-for-learning-directed |
Repo | |
Framework | |
Symmetric block-low-rank layers for fully reversible multilevel neural networks
Title | Symmetric block-low-rank layers for fully reversible multilevel neural networks |
Authors | Bas Peters, Eldad Haber, Keegan Lensink |
Abstract | Factors that limit the size of the input and output of a neural network include memory requirements for the network states/activations to compute gradients, as well as memory for the convolutional kernels or other weights. The memory restriction is especially limiting for applications where we want to learn how to map volumetric data to the desired output, such as video-to-video. Recently developed fully reversible neural networks enable gradient computations using storage of the network states for a couple of layers only. While this saves a tremendous amount of memory, it is the convolutional kernels that take up most memory if fully reversible networks contain multiple invertible pooling/coarsening layers. Invertible coarsening operators such as the orthogonal wavelet transform cause the number of channels to grow explosively. We address this issue by combining fully reversible networks with layers that contain the convolutional kernels in a compressed form directly. Specifically, we introduce a layer that has a symmetric block-low-rank structure. In spirit, this layer is similar to bottleneck and squeeze-and-expand structures. We contribute symmetry by construction, and a combination of notation and flattening of tensors allows us to interpret these network structures in linear algebraic fashion as a block-low-rank matrix in factorized form and observe various properties. A video segmentation example shows that we can train a network to segment the entire video in one go, which would not be possible, in terms of memory requirements, using non-reversible networks and previously proposed reversible networks. |
Tasks | Video Semantic Segmentation |
Published | 2019-12-14 |
URL | https://arxiv.org/abs/1912.12137v1 |
https://arxiv.org/pdf/1912.12137v1.pdf | |
PWC | https://paperswithcode.com/paper/symmetric-block-low-rank-layers-for-fully |
Repo | |
Framework | |
Sequential image processing methods for improving semantic video segmentation algorithms
Title | Sequential image processing methods for improving semantic video segmentation algorithms |
Authors | Beril Sirmacek, Nicolò Botteghi, Santiago Sanchez Escalonilla Plaza |
Abstract | Recently, semantic video segmentation gained high attention especially for supporting autonomous driving systems. Deep learning methods made it possible to implement real time segmentation and object identification algorithms on videos. However, most of the available approaches process each video frame independently disregarding their sequential relation in time. Therefore their results suddenly miss some of the object segments in some of the frames even if they were detected properly in the earlier frames. Herein we propose two sequential probabilistic video frame analysis approaches to improve the segmentation performance of the existing algorithms. Our experiments show that using the information of the past frames we increase the performance and consistency of the state of the art algorithms. |
Tasks | Autonomous Driving, Video Semantic Segmentation |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13348v1 |
https://arxiv.org/pdf/1910.13348v1.pdf | |
PWC | https://paperswithcode.com/paper/191013348 |
Repo | |
Framework | |
Optimal Algorithms for Submodular Maximization with Distributed Constraints
Title | Optimal Algorithms for Submodular Maximization with Distributed Constraints |
Authors | Alexander Robey, Arman Adibi, Brent Schlotfeldt, George J. Pappas, Hamed Hassani |
Abstract | We consider a class of discrete optimization problems that aim to maximize a submodular objective function subject to a distributed partition matroid constraint. More precisely, we consider a networked scenario in which multiple agents choose actions from local strategy sets with the goal of maximizing a submodular objective function defined over the set of all possible actions. Given this distributed setting, we develop Constraint-Distributed Continuous Greedy (CDCG), a message passing algorithm that converges to the tight $(1-1/e)$ approximation factor of the optimum global solution using only local computation and communication. It is known that a sequential greedy algorithm can only achieve a $1/2$ multiplicative approximation of the optimal solution for this class of problems in the distributed setting. Our framework relies on lifting the discrete problem to a continuous domain and developing a consensus algorithm that achieves the tight $(1-1/e)$ approximation guarantee of the global discrete solution once a proper rounding scheme is applied. We also offer empirical results from a multi-agent area coverage problem to show that the proposed method significantly outperforms the state-of-the-art sequential greedy method. |
Tasks | |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13676v2 |
https://arxiv.org/pdf/1909.13676v2.pdf | |
PWC | https://paperswithcode.com/paper/optimal-algorithms-for-submodular |
Repo | |
Framework | |
Object Segmentation Tracking from Generic Video Cues
Title | Object Segmentation Tracking from Generic Video Cues |
Authors | Amirhossein Kardoost, Sabine Müller, Joachim Weickert, Margret Keuper |
Abstract | We propose a light-weight variational framework for online tracking of object segmentations in videos based on optical flow and image boundaries. While high-end computer vision methods on this task rely on sequence specific training of dedicated CNN architectures, we show the potential of a variational model, based on generic video information from motion and color. Such cues are usually required for tasks such as robot navigation or grasp estimation. We leverage them directly for video object segmentation and thus provide accurate segmentations at potentially very low extra cost. Furthermore, we show that our approach can be combined with state-of-the-art CNN-based segmentations in order to improve over their respective results. We evaluate our method on the datasets DAVIS16,17 and SegTrack v2. |
Tasks | Optical Flow Estimation, Robot Navigation, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2019-10-05 |
URL | https://arxiv.org/abs/1910.02258v1 |
https://arxiv.org/pdf/1910.02258v1.pdf | |
PWC | https://paperswithcode.com/paper/object-segmentation-tracking-from-generic |
Repo | |
Framework | |
FlexNER: A Flexible LSTM-CNN Stack Framework for Named Entity Recognition
Title | FlexNER: A Flexible LSTM-CNN Stack Framework for Named Entity Recognition |
Authors | Hongyin Zhu, Wenpeng Hu, Yi Zeng |
Abstract | Named entity recognition (NER) is a foundational technology for information extraction. This paper presents a flexible NER framework compatible with different languages and domains. Inspired by the idea of distant supervision (DS), this paper enhances the representation by increasing the entity-context diversity without relying on external resources. We choose different layer stacks and sub-network combinations to construct the bilateral networks. This strategy can generally improve model performance on different datasets. We conduct experiments on five languages, such as English, German, Spanish, Dutch and Chinese, and biomedical fields, such as identifying the chemicals and gene/protein terms from scientific works. Experimental results demonstrate the good performance of this framework. |
Tasks | Named Entity Recognition |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05009v1 |
https://arxiv.org/pdf/1908.05009v1.pdf | |
PWC | https://paperswithcode.com/paper/flexner-a-flexible-lstm-cnn-stack-framework |
Repo | |
Framework | |
Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels
Title | Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels |
Authors | Murad Qasaimeh, Kristof Denolf, Jack Lo, Kees Vissers, Joseph Zambreno, Phillip H. Jones |
Abstract | Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1-3.2x compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2-22.3x. It is also observed that the FPGA performs increasingly better as a vision application’s pipeline complexity grows. |
Tasks | |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.11879v1 |
https://arxiv.org/pdf/1906.11879v1.pdf | |
PWC | https://paperswithcode.com/paper/comparing-energy-efficiency-of-cpu-gpu-and |
Repo | |
Framework | |
A Preliminary Study on Data Augmentation of Deep Learning for Image Classification
Title | A Preliminary Study on Data Augmentation of Deep Learning for Image Classification |
Authors | Benlin Hu, Cheng Lei, Dong Wang, Shu Zhang, Zhenyu Chen |
Abstract | Deep learning models have a large number of freeparameters that need to be calculated by effective trainingof the models on a great deal of training data to improvetheir generalization performance. However, data obtaining andlabeling is expensive in practice. Data augmentation is one of themethods to alleviate this problem. In this paper, we conduct apreliminary study on how three variables (augmentation method,augmentation rate and size of basic dataset per label) can affectthe accuracy of deep learning for image classification. The studyprovides some guidelines: (1) it is better to use transformationsthat alter the geometry of the images rather than those justlighting and color. (2) 2-3 times augmentation rate is good enoughfor training. (3) the smaller amount of data, the more obviouscontributions could have. |
Tasks | Data Augmentation, Image Classification |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.11887v1 |
https://arxiv.org/pdf/1906.11887v1.pdf | |
PWC | https://paperswithcode.com/paper/a-preliminary-study-on-data-augmentation-of |
Repo | |
Framework | |
Aggregate-Eliminate-Predict: Detecting Adverse Drug Events from Heterogeneous Electronic Health Records
Title | Aggregate-Eliminate-Predict: Detecting Adverse Drug Events from Heterogeneous Electronic Health Records |
Authors | Maria Bampa, Panagiotis Papapetrou |
Abstract | We study the problem of detecting adverse drug events in electronic healthcare records. The challenge in this work is to aggregate heterogeneous data types involving diagnosis codes, drug codes, as well as lab measurements. An earlier framework proposed for the same problem demonstrated promising predictive performance for the random forest classifier by using only lab measurements as data features. We extend this framework, by additionally including diagnosis and drug prescription codes, concurrently. In addition, we employ a recursive feature selection mechanism on top, that extracts the top-k most important features. Our experimental evaluation on five medical datasets of adverse drug events and six different classifiers, suggests that the integration of these additional features provides substantial and statistically significant improvements in terms of AUC, while employing medically relevant features. |
Tasks | Feature Selection |
Published | 2019-07-13 |
URL | https://arxiv.org/abs/1907.06058v1 |
https://arxiv.org/pdf/1907.06058v1.pdf | |
PWC | https://paperswithcode.com/paper/aggregate-eliminate-predict-detecting-adverse |
Repo | |
Framework | |
Persuading Voters: It’s Easy to Whisper, It’s Hard to Speak Loud
Title | Persuading Voters: It’s Easy to Whisper, It’s Hard to Speak Loud |
Authors | Matteo Castiglioni, Andrea Celli, Nicola Gatti |
Abstract | We focus on the following natural question: is it possible to influence the outcome of a voting process through the strategic provision of information to voters who update their beliefs rationally? We investigate whether it is computationally tractable to design a signaling scheme maximizing the probability with which the sender’s preferred candidate is elected. We focus on the model recently introduced by Arieli and Babichenko (2019) (i.e., without inter-agent externalities), and consider, as explanatory examples, $k$-voting rule and plurality voting. There is a sharp contrast between the case in which private signals are allowed and the more restrictive setting in which only public signals are allowed. In the former, we show that an optimal signaling scheme can be computed efficiently both under a $k$-voting rule and plurality voting. In establishing these results, we provide two general (i.e., applicable to settings beyond voting) contributions. Specifically, we extend a well known result by Dughmi and Xu (2017) to more general settings, and prove that, when the sender’s utility function is anonymous, computing an optimal signaling scheme is fixed parameter tractable w.r.t. the number of receivers’ actions. In the public signaling case, we show that the sender’s optimal expected return cannot be approximated to within any factor under a $k$-voting rule. This negative result easily extends to plurality voting and problems where utility functions are anonymous. |
Tasks | |
Published | 2019-08-28 |
URL | https://arxiv.org/abs/1908.10620v2 |
https://arxiv.org/pdf/1908.10620v2.pdf | |
PWC | https://paperswithcode.com/paper/persuading-voters-its-easy-to-whisper-its |
Repo | |
Framework | |
Grasp Type Estimation for Myoelectric Prostheses using Point Cloud Feature Learning
Title | Grasp Type Estimation for Myoelectric Prostheses using Point Cloud Feature Learning |
Authors | Ghazal Ghazaei, Federico Tombari, Nassir Navab, Kianoush Nazarpour |
Abstract | Prosthetic hands can help people with limb difference to return to their life routines. Commercial prostheses, however have several limitations in providing an acceptable dexterity. We approach these limitations by augmenting the prosthetic hands with an off-the-shelf depth sensor to enable the prosthesis to see the object’s depth, record a single view (2.5-D) snapshot, and estimate an appropriate grasp type; using a deep network architecture based on 3D point clouds called PointNet. The human can act as the supervisor throughout the procedure by accepting or refusing the suggested grasp type. We achieved the grasp classification accuracy of up to 88%. Contrary to the case of the RGB data, the depth data provides all the necessary object shape information, which is required for grasp recognition. The PointNet not only enables using 3-D data in practice, but it also prevents excessive computations. Augmentation of the prosthetic hands with such a semi-autonomous system can lead to better differentiation of grasp types, less burden on user, and better performance. |
Tasks | |
Published | 2019-08-07 |
URL | https://arxiv.org/abs/1908.02564v1 |
https://arxiv.org/pdf/1908.02564v1.pdf | |
PWC | https://paperswithcode.com/paper/grasp-type-estimation-for-myoelectric |
Repo | |
Framework | |
Extracting local switching fields in permanent magnets using machine learning
Title | Extracting local switching fields in permanent magnets using machine learning |
Authors | Markus Gusenbauer, Harald Oezelt, Johann Fischbacher, Alexander Kovacs, Panpan Zhao, Thomas George Woodcock, Thomas Schrefl |
Abstract | Microstructural features play an important role for the quality of permanent magnets. The coercivity is greatly influenced by crystallographic defects, which is well known for MnAl-C, for example. In this work we show a direct link of microstructural features to the local coercivity of MnAl-C grains by machine learning. A large number of micromagnetic simulations is performed directly from Electron Backscatter Diffraction (EBSD) data using an automated meshing, modeling and simulation procedure. Decision trees are trained with the simulation results and predict local switching fields from new microscopic data within seconds. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09279v3 |
https://arxiv.org/pdf/1910.09279v3.pdf | |
PWC | https://paperswithcode.com/paper/extracting-local-switching-fields-in |
Repo | |
Framework | |