January 26, 2020

3212 words 16 mins read

Paper Group ANR 1602

Performance Evaluation of Two-layer lossless HDR Coding using Histogram Packing Technique under Various Tone-mapping Operators. Domain Adaptive Transfer Learning for Fault Diagnosis. Self Learning from Large Scale Code Corpus to Infer Structure of Method Invocations. Topological Feature Vectors for Chatter Detection in Turning Processes. On Robustn …

Performance Evaluation of Two-layer lossless HDR Coding using Histogram Packing Technique under Various Tone-mapping Operators


Title	Performance Evaluation of Two-layer lossless HDR Coding using Histogram Packing Technique under Various Tone-mapping Operators
Authors	Hiroyuki Kobayashi, Hitoshi Kiya
Abstract	We proposed a lossless two-layer HDR coding method using a histogram packing technique. The proposed method was demonstrated to outperform the normative JPEG XT encoder, under the use of the default tone-mapping operator. However, the performance under various tone-mapping operators has not been discussed. In this paper, we aim to compare the performance of the proposed method with that of the JPEG XT encoder under the use of various tone-mapping operators to clearly show the characteristic difference between them.
Tasks
Published	2019-07-25
URL	https://arxiv.org/abs/1907.10889v1
PDF	https://arxiv.org/pdf/1907.10889v1.pdf
PWC	https://paperswithcode.com/paper/performance-evaluation-of-two-layer-lossless
Repo
Framework

Domain Adaptive Transfer Learning for Fault Diagnosis


Title	Domain Adaptive Transfer Learning for Fault Diagnosis
Authors	Qin Wang, Gabriel Michau, Olga Fink
Abstract	Thanks to digitization of industrial assets in fleets, the ambitious goal of transferring fault diagnosis models fromone machine to the other has raised great interest. Solving these domain adaptive transfer learning tasks has the potential to save large efforts on manually labeling data and modifying models for new machines in the same fleet. Although data-driven methods have shown great potential in fault diagnosis applications, their ability to generalize on new machines and new working conditions are limited because of their tendency to overfit to the training set in reality. One promising solution to this problem is to use domain adaptation techniques. It aims to improve model performance on the target new machine. Inspired by its successful implementation in computer vision, we introduced Domain-Adversarial Neural Networks (DANN) to our context, along with two other popular methods existing in previous fault diagnosis research. We then carefully justify the applicability of these methods in realistic fault diagnosis settings, and offer a unified experimental protocol for a fair comparison between domain adaptation methods for fault diagnosis problems.
Tasks	Domain Adaptation, Transfer Learning
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06004v1
PDF	https://arxiv.org/pdf/1905.06004v1.pdf
PWC	https://paperswithcode.com/paper/domain-adaptive-transfer-learning-for-fault
Repo
Framework

Self Learning from Large Scale Code Corpus to Infer Structure of Method Invocations


Title	Self Learning from Large Scale Code Corpus to Infer Structure of Method Invocations
Authors	Hung Phan
Abstract	Automatically generating code from a textual description of method invocation confronts challenges. There were two current research directions for this problem. One direction focuses on considering a textual description of method invocations as a separate Natural Language query and do not consider the surrounding context of the code. Another direction takes advantage of a practical large scale code corpus for providing a Machine Translation model to generate code. However, this direction got very low accuracy. In this work, we tried to improve these drawbacks by proposing MethodInfoToCode, an approach that embeds context information and optimizes the ability of learning of original Phrase-based Statistical Machine Translation (PBMT) in NLP to infer implementation of method invocation given method name and other context information. We conduct an expression prediction models learned from 2.86 million method invocations from the practical data of high qualities corpus on Github that used 6 popular libraries: JDK, Android, GWT, Joda-Time, Hibernate, and Xstream. By the evaluation, we show that if the developers only write the method name of a method invocation in a body of a method, MethodInfoToCode can predict the generated expression correctly at 73% in F1 score.
Tasks	Machine Translation
Published	2019-09-06
URL	https://arxiv.org/abs/1909.03147v1
PDF	https://arxiv.org/pdf/1909.03147v1.pdf
PWC	https://paperswithcode.com/paper/self-learning-from-large-scale-code-corpus-to
Repo
Framework

Topological Feature Vectors for Chatter Detection in Turning Processes


Title	Topological Feature Vectors for Chatter Detection in Turning Processes
Authors	Melih C. Yesilli, Firas A. Khasawneh, Andreas Otto
Abstract	Machining processes are most accurately described using complex dynamical systems that include nonlinearities, time delays and stochastic effects. Due to the nature of these models as well as the practical challenges which include time-varying parameters, the transition from numerical/analytical modeling of machining to the analysis of real cutting signals remains challenging. Some studies have focused on studying the time series of cutting processes using machine learning algorithms with the goal of identifying and predicting undesirable vibrations during machining referred to as chatter. These tools typically decompose the signal using Wavelet Packet Transforms (WPT) or Ensemble Empirical Mode Decomposition (EEMD). However, these methods require a significant overhead in identifying the feature vectors before a classifier can be trained. In this study, we present an alternative approach based on featurizing the time series of the cutting process using its topological features. We utilize support vector machine classifier combined with feature vectors derived from persistence diagrams, a tool from persistent homology, to encode distinguishing characteristics based on embedding the time series as a point cloud using Takens embedding. We present the results for several choices of the topological feature vectors, and we compare our results to the WPT and EEMD methods using experimental time series from a turning cutting test. Our results show that in most cases combining the TDA-based features with a simple Support Vector Machine (SVM) yields accuracies that either exceed or are within the error bounds of their WPT and EEMD counterparts.
Tasks	Time Series
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08671v2
PDF	https://arxiv.org/pdf/1905.08671v2.pdf
PWC	https://paperswithcode.com/paper/topological-feature-vectors-for-chatter
Repo
Framework

On Robustness of Principal Component Regression


Title	On Robustness of Principal Component Regression
Authors	Anish Agarwal, Devavrat Shah, Dennis Shen, Dogyoon Song
Abstract	Principal Component Regression (PCR) is a simple, but powerful and ubiquitously utilized method. Its effectiveness is well established when the covariates exhibit low-rank structure. However, its ability to handle settings with noisy, missing, and mixed (discrete and continuous) valued covariates is not understood and remains an important open challenge. As the main contribution of this work, we establish the robustness of PCR (without any change) in this respect and provide meaningful finite-sample analysis. In the process, we establish that PCR is equivalent to performing Linear Regression after pre-processing the covariate matrix via Hard Singular Value Thresholding (HSVT). That is, PCR is equivalent to the recently proposed robust variant of the Synthetic Control method in the context of counterfactual analysis using observational data. As an immediate consequence, we obtain finite-sample analysis of the Robust Synthetic Control (RSC) estimator that was previously absent. As an important contribution to the Synthetic Control literature, we establish that an (approximate linear) synthetic control always exists in the setting of a generalized factor model (or latent variable model) and need not be assumed as an axiom as is traditionally done in the literature. We further discuss a surprising implication of the robustness property of PCR with respect to noise, i.e., PCR can learn a good predictive model even if the covariates are tactfully transformed to preserve (differential) privacy. Finally, this work advances the state-of-the-art analysis for HSVT by establishing stronger guarantees with respect to the $\ell_{2, \infty}$-norm rather than the Frobenius norm as is commonly done in the matrix estimation literature, which may be of interest in its own right.
Tasks	Art Analysis, Causal Inference, Time Series, Time Series Analysis
Published	2019-02-28
URL	https://arxiv.org/abs/1902.10920v7
PDF	https://arxiv.org/pdf/1902.10920v7.pdf
PWC	https://paperswithcode.com/paper/model-agnostic-high-dimensional-error-in
Repo
Framework

Weakly Supervised Body Part Parsing with Pose based Part Priors


Title	Weakly Supervised Body Part Parsing with Pose based Part Priors
Authors	Zhengyuan Yang, Yuncheng Li, Linjie Yang, Ning Zhang, Jiebo Luo
Abstract	Human body part parsing refers to the task of predicting the semantic segmentation mask for each body part. Fully supervised body part parsing methods achieve good performances, but require an enormous amount of effort to annotate part masks for training. In contrast to high annotation costs required for a limited number of part mask annotations, a large number of weak labels such as poses and full body masks already exist and contain relevant information. Motivated by the possibility of using existing weak labels, we propose the first weakly supervised body part parsing framework. The basic idea is to train a parsing network with pose generated part priors that has blank uncertain regions on estimated boundaries, and use an iterative refinement module to generate new supervision and predictions on these regions. When sufficient extra weak supervisions are available, our weakly-supervised results (62.0% mIoU) on Pascal-Person-Part are comparable to the fully supervised state-of-the-art results (63.6% mIoU). Furthermore, in the extended semi-supervised setting, the proposed framework outperforms the state-of-art methods. In addition, we show that the proposed framework can be extended to other keypoint-supervised part parsing tasks such as face parsing.
Tasks	Semantic Segmentation
Published	2019-07-30
URL	https://arxiv.org/abs/1907.13051v1
PDF	https://arxiv.org/pdf/1907.13051v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-body-part-parsing-with-pose
Repo
Framework

As easy as 1, 2… 4? Uncertainty in counting tasks for medical imaging


Title	As easy as 1, 2… 4? Uncertainty in counting tasks for medical imaging
Authors	Zach Eaton-Rosen, Thomas Varsavsky, Sebastien Ourselin, M. Jorge Cardoso
Abstract	Counting is a fundamental task in biomedical imaging and count is an important biomarker in a number of conditions. Estimating the uncertainty in the measurement is thus vital to making definite, informed conclusions. In this paper, we first compare a range of existing methods to perform counting in medical imaging and suggest ways of deriving predictive intervals from these. We then propose and test a method for calculating intervals as an output of a multi-task network. These predictive intervals are optimised to be as narrow as possible, while also enclosing a desired percentage of the data. We demonstrate the effectiveness of this technique on histopathological cell counting and white matter hyperintensity counting. Finally, we offer insight into other areas where this technique may apply.
Tasks
Published	2019-07-25
URL	https://arxiv.org/abs/1907.11555v1
PDF	https://arxiv.org/pdf/1907.11555v1.pdf
PWC	https://paperswithcode.com/paper/as-easy-as-1-2-4-uncertainty-in-counting
Repo
Framework

Performance Measurement for Deep Bayesian Neural Network


Title	Performance Measurement for Deep Bayesian Neural Network
Authors	Yikuan Li, Yajie Zhu
Abstract	Deep Bayesian neural network has aroused a great attention in recent years since it combines the benefits of deep neural network and probability theory. Because of this, the network can make predictions and quantify the uncertainty of the predictions at the same time, which is important in many life-threatening areas. However, most of the recent researches are mainly focusing on making the Bayesian neural network easier to train, and proposing methods to estimate the uncertainty. I notice there are very few works that properly discuss the ways to measure the performance of the Bayesian neural network. Although accuracy and average uncertainty are commonly used for now, they are too general to provide any insight information about the model. In this paper, we would like to introduce more specific criteria and propose several metrics to measure the model performance from different perspectives, which include model calibration measurement, data rejection ability and uncertainty divergence for samples from the same and different distributions.
Tasks	Calibration
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08674v2
PDF	http://arxiv.org/pdf/1903.08674v2.pdf
PWC	https://paperswithcode.com/paper/performance-measurement-for-deep-bayesian
Repo
Framework

Unsupervised inference approach to facial attractiveness


Title	Unsupervised inference approach to facial attractiveness
Authors	Miguel Ibáñez-Berganza, Gian Luca Lancia, Ambra Amico, Bernardo Monechi, Vittorio Loreto
Abstract	The perception of facial beauty is a complex phenomenon depending on many, detailed and global facial features influencing each other. In the machine learning community this problem is typically tackled as a problem of supervised inference. However, it has been conjectured that this approach does not capture the complexity of the phenomenon. A recent original experiment (Ib'a~nez-Berganza et al., Scientific Reports 9, 8364, 2019) allowed different human subjects to navigate the face-space and “sculpt” their preferred modification of a reference facial portrait. Here we present an unsupervised inference study of the set of sculpted facial vectors in that experiment. We first infer minimal, interpretable, and faithful probabilistic models (through Maximum Entropy and artificial neural networks) of the preferred facial variations, that capture the origin of the observed inter-subject diversity in the sculpted faces. The application of such generative models to the supervised classification of the gender of the sculpting subjects, reveals an astonishingly high prediction accuracy. This result suggests that much relevant information regarding the subjects may influence (and be elicited from) her/his facial preference criteria, in agreement with the multiple motive theory of attractiveness proposed in previous works.
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1910.14072v2
PDF	https://arxiv.org/pdf/1910.14072v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-inference-approach-to-facial
Repo
Framework

Disentangle, align and fuse for multimodal and zero-shot image segmentation


Title	Disentangle, align and fuse for multimodal and zero-shot image segmentation
Authors	Agisilaos Chartsias, Giorgos Papanastasiou, Chengjia Wang, Scott Semple, David Newby, Rohan Dharmakumar, Sotirios A. Tsaftaris
Abstract	Magnetic resonance (MR) protocols rely on several sequences to properly assess pathology and organ status. Yet, despite advances in image analysis we tend to treat each sequence, here termed modality, in isolation. Taking advantage of the information shared between modalities (largely an organ’s anatomy) is beneficial for multi-modality multi-input processing and learning. However, we must overcome inherent anatomical misregistrations and disparities in signal intensity across the modalities to claim this benefit. We present a method that offers improved segmentation accuracy of the modality of interest (over a single input model), by learning to leverage information present in other modalities, enabling semi-supervised and zero shot learning. Core to our method is learning a disentangled decomposition into anatomical and imaging factors. Shared anatomical factors from the different inputs are jointly processed and fused to extract more accurate segmentation masks. Image misregistrations are corrected with a Spatial Transformer Network, that non-linearly aligns the anatomical factors. The imaging factor captures signal intensity characteristics across different modality data, and is used for image reconstruction, enabling semi-supervised learning. Temporal and slice pairing between inputs are learned dynamically. We demonstrate applications in Late Gadolinium Enhanced (LGE) and Blood Oxygenation Level Dependent (BOLD) cardiac segmentation, as well as in T2 abdominal segmentation.
Tasks	Cardiac Segmentation, Image Reconstruction, Semantic Segmentation, Zero-Shot Learning
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04417v2
PDF	https://arxiv.org/pdf/1911.04417v2.pdf
PWC	https://paperswithcode.com/paper/disentangle-align-and-fuse-for-multimodal-and
Repo
Framework

A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading


Title	A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading
Authors	Ya Zhao, Rui Xu, Mingli Song
Abstract	Lip reading aims at decoding texts from the movement of a speaker’s mouth. In recent years, lip reading methods have made great progress for English, at both word-level and sentence-level. Unlike English, however, Chinese Mandarin is a tone-based language and relies on pitches to distinguish lexical or grammatical meaning, which significantly increases the ambiguity for the lip reading task. In this paper, we propose a Cascade Sequence-to-Sequence Model for Chinese Mandarin (CSSMCM) lip reading, which explicitly models tones when predicting sentence. Tones are modeled based on visual information and syntactic structure, and are used to predict sentence along with visual information and syntactic structure. In order to evaluate CSSMCM, a dataset called CMLR (Chinese Mandarin Lip Reading) is collected and released, consisting of over 100,000 natural sentences from China Network Television website. When trained on CMLR dataset, the proposed CSSMCM surpasses the performance of state-of-the-art lip reading frameworks, which confirms the effectiveness of explicit modeling of tones for Chinese Mandarin lip reading.
Tasks
Published	2019-08-14
URL	https://arxiv.org/abs/1908.04917v2
PDF	https://arxiv.org/pdf/1908.04917v2.pdf
PWC	https://paperswithcode.com/paper/a-cascade-sequence-to-sequence-model-for
Repo
Framework

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition


Title	LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition
Authors	Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis
Abstract	This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios. Exploiting decent yet computationally efficient features derived at a coarse scale with a lightweight CNN model, LiteEval dynamically decides on-the-fly whether to compute more powerful features for incoming video frames at a finer scale to obtain more details. This is achieved by a coarse LSTM and a fine LSTM operating cooperatively, as well as a conditional gating module to learn when to allocate more computation. Extensive experiments are conducted on two large-scale video benchmarks, FCVID and ActivityNet, and the results demonstrate LiteEval requires substantially less computation while offering excellent classification accuracy for both online and offline predictions.
Tasks	Video Recognition
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01601v1
PDF	https://arxiv.org/pdf/1912.01601v1.pdf
PWC	https://paperswithcode.com/paper/liteeval-a-coarse-to-fine-framework-for-1
Repo
Framework

Visualizing Deep Neural Networks for Speech Recognition with Learned Topographic Filter Maps


Title	Visualizing Deep Neural Networks for Speech Recognition with Learned Topographic Filter Maps
Authors	Andreas Krug, Sebastian Stober
Abstract	The uninformative ordering of artificial neurons in Deep Neural Networks complicates visualizing activations in deeper layers. This is one reason why the internal structure of such models is very unintuitive. In neuroscience, activity of real brains can be visualized by highlighting active regions. Inspired by those techniques, we train a convolutional speech recognition model, where filters are arranged in a 2D grid and neighboring filters are similar to each other. We show, how those topographic filter maps visualize artificial neuron activations more intuitively. Moreover, we investigate, whether this causes phoneme-responsive neurons to be grouped in certain regions of the topographic map.
Tasks	Speech Recognition
Published	2019-12-06
URL	https://arxiv.org/abs/1912.04067v1
PDF	https://arxiv.org/pdf/1912.04067v1.pdf
PWC	https://paperswithcode.com/paper/visualizing-deep-neural-networks-for-speech
Repo
Framework

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces


Title	Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Authors	Guy Lorberbom, Chris J. Maddison, Nicolas Heess, Tamir Hazan, Daniel Tarlow
Abstract	Direct optimization is an appealing approach to differentiating through discrete quantities. Rather than relying on REINFORCE or continuous relaxations of discrete structures, it uses optimization in discrete space to compute gradients through a discrete argmax operation. In this paper, we develop reinforcement learning algorithms that use direct optimization to compute gradients of the expected return in environments with discrete actions. We call the resulting algorithms “direct policy gradient” algorithms and investigate their properties, showing that there is a built-in variance reduction technique and that a parameter that was previously viewed as a numerical approximation can be interpreted as controlling risk sensitivity. We also tackle challenges in algorithm design, leveraging ideas from A$^\star$ Sampling to develop a practical algorithm. Empirically, we show that the algorithm performs well in illustrative domains, and that it can make use of domain knowledge about upper bounds on return-to-go to speed up training.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06062v1
PDF	https://arxiv.org/pdf/1906.06062v1.pdf
PWC	https://paperswithcode.com/paper/direct-policy-gradients-direct-optimization
Repo
Framework

Learning Efficient Video Representation with Video Shuffle Networks


Title	Learning Efficient Video Representation with Video Shuffle Networks
Authors	Pingchuan Ma, Yao Zhou, Yu Lu, Wei Zhang
Abstract	3D CNN shows its strong ability in learning spatiotemporal representation in recent video recognition tasks. However, inflating 2D convolution to 3D inevitably introduces additional computational costs, making it cumbersome in practical deployment. We consider whether there is a way to equip the conventional 2D convolution with temporal vision no requiring expanding its kernel. To this end, we propose the video shuffle, a parameter-free plug-in component that efficiently reallocates the inputs of 2D convolution so that its receptive field can be extended to the temporal dimension. In practical, video shuffle firstly divides each frame feature into multiple groups and then aggregate the grouped features via temporal shuffle operation. This allows the following 2D convolution aggregate the global spatiotemporal features. The proposed video shuffle can be flexibly inserted into popular 2D CNNs, forming the Video Shuffle Networks (VSN). With a simple yet efficient implementation, VSN performs surprisingly well on temporal modeling benchmarks. In experiments, VSN not only gains non-trivial improvements on Kinetics and Moments in Time, but also achieves state-of-the-art performance on Something-Something-V1, Something-Something-V2 datasets.
Tasks	Video Recognition
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11319v1
PDF	https://arxiv.org/pdf/1911.11319v1.pdf
PWC	https://paperswithcode.com/paper/learning-efficient-video-representation-with
Repo
Framework