January 26, 2020

3295 words 16 mins read

Paper Group ANR 1512

Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning. Reproducibility in Machine Learning for Health. Training DNN IoT Applications for Deployment On Analog NVM Crossbars. One-Shot Mutual Affine-Transfer for Photorealistic Stylization. Towards Linearization Machine Learning Algorithm …

Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning


Title	Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning
Authors	Andrey V. Savchenko
Abstract	In this paper a new formulation of event recognition task is examined: it is required to predict event categories in a gallery of images, for which albums (groups of photos corresponding to a single event) are unknown. We propose the novel two-stage approach. At first, features are extracted in each photo using the pre-trained convolutional neural network. These features are classified individually. The scores of the classifier are used to group sequential photos into several clusters. Finally, the features of photos in each group are aggregated into a single descriptor using neural attention mechanism. This algorithm is optionally extended to improve the accuracy for classification of each image in an album. In contrast to conventional fine-tuning of convolutional neural networks (CNN) we proposed to use image captioning, i.e., generative model that converts images to textual descriptions. They are one-hot encoded and summarized into sparse feature vector suitable for learning of arbitrary classifier. Experimental study with Photo Event Collection and Multi-Label Curation of Flickr Events Dataset demonstrates that our approach is 9-20% more accurate than event recognition on single photos. Moreover, proposed method has 13-16% lower error rate than classification of groups of photos obtained with hierarchical clustering. It is experimentally shown that the image captions trained on Conceptual Captions dataset can be classified more accurately than the features from object detector, though they both are obviously not as rich as the CNN-based features. However, it is possible to combine our approach with conventional CNNs in an ensemble to provide the state-of-the-art results for several event datasets.
Tasks	Image Captioning
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11010v2
PDF	https://arxiv.org/pdf/1911.11010v2.pdf
PWC	https://paperswithcode.com/paper/event-recognition-with-automatic-album
Repo
Framework

Reproducibility in Machine Learning for Health


Title	Reproducibility in Machine Learning for Health
Authors	Matthew B. A. McDermott, Shirly Wang, Nikki Marinsek, Rajesh Ranganath, Marzyeh Ghassemi, Luca Foschini
Abstract	Machine learning algorithms designed to characterize, monitor, and intervene on human health (ML4H) are expected to perform safely and reliably when operating at scale, potentially outside strict human supervision. This requirement warrants a stricter attention to issues of reproducibility than other fields of machine learning. In this work, we conduct a systematic evaluation of over 100 recently published ML4H research papers along several dimensions related to reproducibility. We find that the field of ML4H compares poorly to more established machine learning fields, particularly concerning data and code accessibility. Finally, drawing from success in other fields of science, we propose recommendations to data providers, academic publishers, and the ML4H research community in order to promote reproducible research moving forward.
Tasks
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01463v1
PDF	https://arxiv.org/pdf/1907.01463v1.pdf
PWC	https://paperswithcode.com/paper/reproducibility-in-machine-learning-for
Repo
Framework

Training DNN IoT Applications for Deployment On Analog NVM Crossbars


Title	Training DNN IoT Applications for Deployment On Analog NVM Crossbars
Authors	Fernando García-Redondo, Shidhartha Das, Glen Rosendale
Abstract	Deep Neural Networks (DNN) applications are increasingly being deployed in always-on IoT devices. However, the limited resources in tiny microcontroller units (MCUs) limit the deployment of the required Machine Learning (ML) models. Therefore alternatives to traditional architectures such as Computation-In-Memory based on resistive nonvolatile memories (NVM), promising high integration density, low power consumption and massively-parallel computation capabilities, are under study. However, these technologies are still immature and suffer from intrinsic analog nature problems –noise, non-linearities, inability to represent negative values, and limited-precision per device. Consequently, mapping DNNs to NVM crossbars requires the full-custom design of each one of the DNN layers, involving finely tuned blocks such as ADC/DACs or current subtractors/adders, and thus limiting the chip reconfigurability. This paper presents an NVM-aware framework to efficiently train and map the DNN to the NVM hardware. We propose the first method that trains the NN weights while ensuring uniformity across layer weights/activations, improving HW blocks re-usability. Firstly, this quantization algorithm obtains uniform scaling across the DNN layers independently of their characteristics, removing the need of per-layer full-custom design while reducing the peripheral HW. Secondly, for certain applications we make use of Network Architecture Search, to avoid using negative weights. Unipolar weight matrices translate into simpler analog periphery and lead to $67 %$ area improvement and up to $40 %$ power reduction. We validate our idea with CIFAR10 and HAR applications by mapping to crossbars using $4$-bit and $2$-bit devices. Up to $92.91%$ accuracy ($95%$ floating-point) can be achieved using $2$-bit only-positive weights for HAR.
Tasks	Quantization
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13850v2
PDF	https://arxiv.org/pdf/1910.13850v2.pdf
PWC	https://paperswithcode.com/paper/training-dnn-iot-applications-for-deployment
Repo
Framework

One-Shot Mutual Affine-Transfer for Photorealistic Stylization


Title	One-Shot Mutual Affine-Transfer for Photorealistic Stylization
Authors	Ying Qu, Zhenzhou Shao, Hairong Qi
Abstract	Photorealistic style transfer aims to transfer the style of a reference photo onto a content photo naturally, such that the stylized image looks like a real photo taken by a camera. Existing state-of-the-art methods are prone to spatial structure distortion of the content image and global color inconsistency across different semantic objects, making the results less photorealistic. In this paper, we propose a one-shot mutual Dirichlet network, to address these challenging issues. The essential contribution of the work is the realization of a representation scheme that successfully decouples the spatial structure and color information of images, such that the spatial structure can be well preserved during stylization. This representation is discriminative and context-sensitive with respect to semantic objects. It is extracted with a shared sparse Dirichlet encoder. Moreover, such representation is encouraged to be matched between the content and style images for faithful color transfer. The affine-transfer model is embedded in the decoder of the network to facilitate the color transfer. The strong representative and discriminative power of the proposed network enables one-shot learning given only one content-style image pair. Experimental results demonstrate that the proposed method is able to generate photorealistic photos without spatial distortion or abrupt color changes.
Tasks	One-Shot Learning, Style Transfer
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10274v1
PDF	https://arxiv.org/pdf/1907.10274v1.pdf
PWC	https://paperswithcode.com/paper/one-shot-mutual-affine-transfer-for
Repo
Framework

Towards Linearization Machine Learning Algorithms


Title	Towards Linearization Machine Learning Algorithms
Authors	Steve Tueno
Abstract	This paper is about a machine learning approach based on the multilinear projection of an unknown function (or probability distribution) to be estimated towards a linear (or multilinear) dimensional space E’. The proposal transforms the problem of predicting the target of an observation x into a problem of determining a consensus among the k nearest neighbors of x’s image within the dimensional space E’. The algorithms that concretize it allow both regression and binary classification. Implementations carried out using Scala/Spark and assessed on a dozen LIBSVM datasets have demonstrated improvements in prediction accuracies in comparison with other prediction algorithms implemented within Spark MLLib such as multilayer perceptrons, logistic regression classifiers and random forests.
Tasks
Published	2019-08-14
URL	https://arxiv.org/abs/1908.06871v1
PDF	https://arxiv.org/pdf/1908.06871v1.pdf
PWC	https://paperswithcode.com/paper/towards-linearization-machine-learning
Repo
Framework

Modeling Daily Pan Evaporation in Humid Climates Using Gaussian Process Regression


Title	Modeling Daily Pan Evaporation in Humid Climates Using Gaussian Process Regression
Authors	Sevda Shabani, Saeed Samadianfard, Mohammad Taghi Sattari, Shahab Shamshirband, Amir Mosavi, Tibor Kmet, Annamaria R. Varkonyi-Koczy
Abstract	Evaporation is one of the main processes in the hydrological cycle, and it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, the evaporation is a complex and nonlinear phenomenon; therefore, the data-based methods can be used to have precise estimations of it. In this regard, in the present study, Gaussian Process Regression, Nearest-Neighbor, Random Forest and Support Vector Regression were used to estimate the pan evaporation in the meteorological stations of Golestan Province, Iran. For this purpose, meteorological data including PE, temperature, relative humidity, wind speed and sunny hours collected from the Gonbad-e Kavus, Gorgan and Bandar Torkman stations from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error, correlation coefficient and Mean Absolute Error. Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. We report that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W, and S had the most accurate performances and proposed for precise estimation of PE. Due to the high rate of evaporation in Iran and the lack of measurement instruments, the findings of the current study indicated that the PE values might be estimated with few easily measured meteorological parameters accurately.
Tasks
Published	2019-08-01
URL	https://arxiv.org/abs/1908.04267v1
PDF	https://arxiv.org/pdf/1908.04267v1.pdf
PWC	https://paperswithcode.com/paper/modeling-daily-pan-evaporation-in-humid
Repo
Framework

CRUR: Coupled-Recurrent Unit for Unification, Conceptualization and Context Capture for Language Representation – A Generalization of Bi Directional LSTM


Title	CRUR: Coupled-Recurrent Unit for Unification, Conceptualization and Context Capture for Language Representation – A Generalization of Bi Directional LSTM
Authors	Chiranjib Sur
Abstract	In this work we have analyzed a novel concept of sequential binding based learning capable network based on the coupling of recurrent units with Bayesian prior definition. The coupling structure encodes to generate efficient tensor representations that can be decoded to generate efficient sentences and can describe certain events. These descriptions are derived from structural representations of visual features of images and media. An elaborated study of the different types of coupling recurrent structures are studied and some insights of their performance are provided. Supervised learning performance for natural language processing is judged based on statistical evaluations, however, the truth is perspective, and in this case the qualitative evaluations reveal the real capability of the different architectural strengths and variations. Bayesian prior definition of different embedding helps in better characterization of the sentences based on the natural language structure related to parts of speech and other semantic level categorization in a form which is machine interpret-able and inherits the characteristics of the Tensor Representation binding and unbinding based on the mutually orthogonality. Our approach has surpassed some of the existing basic works related to image captioning.
Tasks	Image Captioning
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10132v1
PDF	https://arxiv.org/pdf/1911.10132v1.pdf
PWC	https://paperswithcode.com/paper/crur-coupled-recurrent-unit-for-unification
Repo
Framework

TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning


Title	TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning
Authors	Chiranjib Sur
Abstract	Image captioning can be improved if the structure of the graphical representations can be formulated with conceptual positional binding. In this work, we have introduced a novel technique for caption generation using the neural-symbolic encoding of the scene-graphs, derived from regional visual information of the images and we call it Tensor Product Scene-Graph-Triplet Representation (TP$_{sgt}$R). While, most of the previous works concentrated on identification of the object features in images, we introduce a neuro-symbolic embedding that can embed identified relationships among different regions of the image into concrete forms, instead of relying on the model to compose for any/all combinations. These neural symbolic representation helps in better definition of the neural symbolic space for neuro-symbolic attention and can be transformed to better captions. With this approach, we introduced two novel architectures (TP$_{sgt}$R-TDBU and TP$_{sgt}$R-sTDBU) for comparison and experiment result demonstrates that our approaches outperformed the other models, and generated captions are more comprehensive and natural.
Tasks	Image Captioning
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10115v1
PDF	https://arxiv.org/pdf/1911.10115v1.pdf
PWC	https://paperswithcode.com/paper/tpsgtr-neural-symbolic-tensor-product-scene
Repo
Framework

Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air


Title	Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air
Authors	Mohammad Mohammadi Amiri, Deniz Gunduz
Abstract	We study collaborative machine learning (ML) at the wireless edge, where power and bandwidth-limited wireless devices with local datasets carry out distributed stochastic gradient descent (DSGD) with the help of a remote parameter server (PS). Standard approaches assume separate computation and communication, where local gradient estimates are compressed and communicated to the PS over orthogonal links. Following this digital approach, we introduce D-DSGD, in which the wireless terminals, referred to as the workers, employ gradient quantization and error accumulation, and transmit their gradient estimates to the PS over the underlying wireless multiple access channel (MAC). We then introduce an analog scheme, called A-DSGD, which exploits the additive nature of the wireless MAC for over-the-air gradient computation. In A-DSGD, the workers first sparsify their gradient estimates, and then project them to a lower dimensional space imposed by the available channel bandwidth. These projections are transmitted directly over the MAC without employing any digital code. Numerical results show that A-DSGD converges much faster than D-DSGD thanks to its more efficient use of the limited bandwidth and the natural alignment of the gradient estimates over the channel. The improvement is particularly compelling at low power and low bandwidth regimes. We also observe that the performance of A-DSGD improves with the number of workers (keeping the total size of the dataset constant), while D-DSGD deteriorates, limiting the ability of the latter in harnessing the computation power of edge devices. The lack of quantization and channel encoding/decoding in A-DSGD further speeds up communication, making it very attractive for low-latency ML applications at the wireless network edge.
Tasks	Quantization
Published	2019-01-03
URL	http://arxiv.org/abs/1901.00844v2
PDF	http://arxiv.org/pdf/1901.00844v2.pdf
PWC	https://paperswithcode.com/paper/machine-learning-at-the-wireless-edge
Repo
Framework

Statistical physics of unsupervised learning with prior knowledge in neural networks


Title	Statistical physics of unsupervised learning with prior knowledge in neural networks
Authors	Tianqi Hou, Haiping Huang
Abstract	Integrating sensory inputs with prior beliefs from past experiences in unsupervised learning is a common and fundamental characteristic of brain or artificial neural computation. However, a quantitative role of prior knowledge in unsupervised learning remains unclear, prohibiting a scientific understanding of unsupervised learning. Here, we propose a statistical physics model of unsupervised learning with prior knowledge, revealing that the sensory inputs drive a series of continuous phase transitions related to spontaneous intrinsic-symmetry breaking. The intrinsic symmetry includes both reverse symmetry and permutation symmetry, commonly observed in most artificial neural networks. Compared to the prior-free scenario, the prior reduces more strongly the minimal data size triggering the reverse symmetry breaking transition, and moreover, the prior merges, rather than separates, permutation symmetry breaking phases. We claim that the prior can be learned from data samples, which in physics corresponds to a two-parameter Nishimori plane constraint. This work thus reveals mechanisms about the influence of the prior on unsupervised learning.
Tasks
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02344v1
PDF	https://arxiv.org/pdf/1911.02344v1.pdf
PWC	https://paperswithcode.com/paper/statistical-physics-of-unsupervised-learning
Repo
Framework

Case-Based Histopathological Malignancy Diagnosis using Convolutional Neural Networks


Title	Case-Based Histopathological Malignancy Diagnosis using Convolutional Neural Networks
Authors	Qicheng Lao, Thomas Fevens
Abstract	In practice, histopathological diagnosis of tumor malignancy often requires a human expert to scan through histopathological images at multiple magnification levels, after which a final diagnosis can be accurately determined. However, previous research on such classification tasks using convolutional neural networks primarily determine a diagnosis for a single magnification level. In this paper, we propose a case-based approach using deep residual neural networks for histopathological malignancy diagnosis, where a case is defined as a sequence of images from the patient at all available levels of magnification. Effectively, through mimicking what a human expert would actually do, our approach makes a diagnosis decision based on features learned in combination at multiple magnification levels. Our results show that the case-based approach achieves better performance than the state-of-the-art methods when evaluated on BreaKHis, a histopathological image dataset for breast tumors.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11567v1
PDF	https://arxiv.org/pdf/1905.11567v1.pdf
PWC	https://paperswithcode.com/paper/case-based-histopathological-malignancy
Repo
Framework

Can Neural Image Captioning be Controlled via Forced Attention?


Title	Can Neural Image Captioning be Controlled via Forced Attention?
Authors	Philipp Sadler, Tatjana Scheffler, David Schlangen
Abstract	Learned dynamic weighting of the conditioning signal (attention) has been shown to improve neural language generation in a variety of settings. The weights applied when generating a particular output sequence have also been viewed as providing a potentially explanatory insight into the internal workings of the generator. In this paper, we reverse the direction of this connection and ask whether through the control of the attention of the model we can control its output. Specifically, we take a standard neural image captioning model that uses attention, and fix the attention to pre-determined areas in the image. We evaluate whether the resulting output is more likely to mention the class of the object in that area than the normally generated caption. We introduce three effective methods to control the attention and find that these are producing expected results in up to 28.56% of the cases.
Tasks	Image Captioning, Text Generation
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03936v1
PDF	https://arxiv.org/pdf/1911.03936v1.pdf
PWC	https://paperswithcode.com/paper/can-neural-image-captioning-be-controlled-via
Repo
Framework

Forensic shoe-print identification: a brief survey


Title	Forensic shoe-print identification: a brief survey
Authors	Imad Rida, Sambit Bakshi, Hugo Proença, Lunke Fei, Amine Nait-Ali, Abdenour Hadid
Abstract	As an advanced research topic in forensics science, automatic shoe-print identification has been extensively studied in the last two decades, since shoe marks are the clues most frequently left in a crime scene. Hence, these impressions provide a pertinent evidence for the proper progress of investigations in order to identify the potential criminals. The main goal of this survey is to provide a cohesive overview of the research carried out in forensic shoe-print identification and its basic background. Apart defining the problem and describing the phases that typically compose the processing chain of shoe-print identification, we provide a summary/comparison of the state-of-the-art approaches, in order to guide the neophyte and help to advance the research topic. This is done through introducing simple and basic taxonomies as well as summaries of the state-of-the-art performance. Lastly, we discuss the current open problems and challenges in this research topic, point out for promising directions in this field.
Tasks
Published	2019-01-05
URL	http://arxiv.org/abs/1901.01431v2
PDF	http://arxiv.org/pdf/1901.01431v2.pdf
PWC	https://paperswithcode.com/paper/forensic-shoe-print-identification-a-brief
Repo
Framework

$n$-ML: Mitigating Adversarial Examples via Ensembles of Topologically Manipulated Classifiers


Title	$n$-ML: Mitigating Adversarial Examples via Ensembles of Topologically Manipulated Classifiers
Authors	Mahmood Sharif, Lujo Bauer, Michael K. Reiter
Abstract	This paper proposes a new defense called $n$-ML against adversarial examples, i.e., inputs crafted by perturbing benign inputs by small amounts to induce misclassifications by classifiers. Inspired by $n$-version programming, $n$-ML trains an ensemble of $n$ classifiers, and inputs are classified by a vote of the classifiers in the ensemble. Unlike prior such approaches, however, the classifiers in the ensemble are trained specifically to classify adversarial examples differently, rendering it very difficult for an adversarial example to obtain enough votes to be misclassified. We show that $n$-ML roughly retains the benign classification accuracies of state-of-the-art models on the MNIST, CIFAR10, and GTSRB datasets, while simultaneously defending against adversarial examples with better resilience than the best defenses known to date and, in most cases, with lower classification-time overhead.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09059v1
PDF	https://arxiv.org/pdf/1912.09059v1.pdf
PWC	https://paperswithcode.com/paper/n-ml-mitigating-adversarial-examples-via
Repo
Framework

Temporal Action Localization using Long Short-Term Dependency


Title	Temporal Action Localization using Long Short-Term Dependency
Authors	Yuan Zhou, Hongru Li, Sun-Yuan Kung
Abstract	Temporal action localization in untrimmed videos is an important but difficult task. Difficulties are encountered in the application of existing methods when modeling temporal structures of videos. In the present study, we developed a novel method, referred to as Gemini Network, for effective modeling of temporal structures and achieving high-performance temporal action localization. The significant improvements afforded by the proposed method are attributable to three major factors. First, the developed network utilizes two subnets for effective modeling of temporal structures. Second, three parallel feature extraction pipelines are used to prevent interference between the extractions of different stage features. Third, the proposed method utilizes auxiliary supervision, with the auxiliary classifier losses affording additional constraints for improving the modeling capability of the network. As a demonstration of its effectiveness, the Gemini Network was used to achieve state-of-the-art temporal action localization performance on two challenging datasets, namely, THUMOS14 and ActivityNet.
Tasks	Action Localization, Temporal Action Localization
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01060v1
PDF	https://arxiv.org/pdf/1911.01060v1.pdf
PWC	https://paperswithcode.com/paper/temporal-action-localization-using-long-short
Repo
Framework