January 29, 2020

3377 words 16 mins read

Paper Group ANR 569

A Predictive Model for Steady-State Multiphase Pipe Flow: Machine Learning on Lab Data. Learning Target-oriented Dual Attention for Robust RGB-T Tracking. Image-based reconstruction for the impact problems by using DPNNs. Problems with automating translation of movie/TV show subtitles. Falls Prediction Based on Body Keypoints and Seq2Seq Architectu …

A Predictive Model for Steady-State Multiphase Pipe Flow: Machine Learning on Lab Data


Title	A Predictive Model for Steady-State Multiphase Pipe Flow: Machine Learning on Lab Data
Authors	Evgenii Kanin, Andrei Osiptsov, Albert Vainshtein, Evgeny Burnaev
Abstract	Engineering simulators used for steady-state multiphase pipe flows are commonly utilized to predict pressure drop. Such simulators are typically based on either empirical correlations or first-principles mechanistic models. The simulators allow evaluating the pressure drop in multiphase pipe flow with acceptable accuracy. However, the only shortcoming of these correlations and mechanistic models is their applicability. In order to extend the applicability and the accuracy of the existing accessible methods, a method of pressure drop calculation in the pipeline is proposed. The method is based on well segmentation and calculation of the pressure gradient in each segment using three surrogate models based on Machine Learning algorithms trained on a representative lab data set from the open literature. The first model predicts the value of a liquid holdup in the segment, the second one determines the flow pattern, and the third one is used to estimate the pressure gradient. To build these models, several ML algorithms are trained such as Random Forest, Gradient Boosting Decision Trees, Support Vector Machine, and Artificial Neural Network, and their predictive abilities are cross-compared. The proposed method for pressure gradient calculation yields $R^2 = 0.95$ by using the Gradient Boosting algorithm as compared with $R^2 = 0.92$ in case of Mukherjee and Brill correlation and $R^2 = 0.91$ when a combination of Ansari and Xiao mechanistic models is utilized. The method for pressure drop prediction is also validated on three real field cases. Validation indicates that the proposed model yields the following coefficients of determination: $R^2 = 0.806, 0.815$ and 0.99 as compared with the highest values obtained by commonly used techniques: $R^2 = 0.82$ (Beggs and Brill correlation), $R^2 = 0.823$ (Mukherjee and Brill correlation) and $R^2 = 0.98$ (Beggs and Brill correlation).
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09746v1
PDF	https://arxiv.org/pdf/1905.09746v1.pdf
PWC	https://paperswithcode.com/paper/a-predictive-model-for-steady-state
Repo
Framework

Learning Target-oriented Dual Attention for Robust RGB-T Tracking


Title	Learning Target-oriented Dual Attention for Robust RGB-T Tracking
Authors	Rui Yang, Yabin Zhu, Xiao Wang, Chenglong Li, Jin Tang
Abstract	RGB-Thermal object tracking attempt to locate target object using complementary visual and thermal infrared data. Existing RGB-T trackers fuse different modalities by robust feature representation learning or adaptive modal weighting. However, how to integrate dual attention mechanism for visual tracking is still a subject that has not been studied yet. In this paper, we propose two visual attention mechanisms for robust RGB-T object tracking. Specifically, the local attention is implemented by exploiting the common visual attention of RGB and thermal data to train deep classifiers. We also introduce the global attention, which is a multi-modal target-driven attention estimation network. It can provide global proposals for the classifier together with local proposals extracted from previous tracking result. Extensive experiments on two RGB-T benchmark datasets validated the effectiveness of our proposed algorithm.
Tasks	Object Tracking, Representation Learning, Rgb-T Tracking, Visual Tracking
Published	2019-08-12
URL	https://arxiv.org/abs/1908.04441v1
PDF	https://arxiv.org/pdf/1908.04441v1.pdf
PWC	https://paperswithcode.com/paper/learning-target-oriented-dual-attention-for
Repo
Framework

Image-based reconstruction for the impact problems by using DPNNs


Title	Image-based reconstruction for the impact problems by using DPNNs
Authors	Yu Li, Hu Wang, Wenquan Shuai, Honghao Zhang, Yong Peng
Abstract	With the improvement of the pattern recognition and feature extraction of Deep Neural Networks (DPNNs), image-based design and optimization have been widely used in multidisciplinary researches. Recently, a Reconstructive Neural Network (ReConNN) has been proposed to obtain an image-based model from an analysis-based model [1, 2], and a steady-state heat transfer of a heat sink has been successfully reconstructed. Commonly, this method is suitable to handle stable-state problems. However, it has difficulties handling nonlinear transient impact problems, due to the bottlenecks of the Deep Neural Network (DPNN). For example, nonlinear transient problems make it difficult for the Generative Adversarial Network (GAN) to generate various reasonable images. Therefore, in this study, an improved ReConNN method is proposed to address the mentioned weaknesses. Time-dependent ordered images can be generated. Furthermore, the improved method is successfully applied in impact simulation case and engineering experiment. Through the experiments, comparisons and analyses, the improved method is demonstrated to outperform the former one in terms of its accuracy, efficiency and costs.
Tasks
Published	2019-04-08
URL	https://arxiv.org/abs/1905.03229v3
PDF	https://arxiv.org/pdf/1905.03229v3.pdf
PWC	https://paperswithcode.com/paper/190503229
Repo
Framework

Problems with automating translation of movie/TV show subtitles


Title	Problems with automating translation of movie/TV show subtitles
Authors	Prabhakar Gupta, Mayank Sharma, Kartik Pitale, Keshav Kumar
Abstract	We present 27 problems encountered in automating the translation of movie/TV show subtitles. We categorize each problem in one of the three categories viz. problems directly related to textual translation, problems related to subtitle creation guidelines, and problems due to adaptability of machine translation (MT) engines. We also present the findings of a translation quality evaluation experiment where we share the frequency of 16 key problems. We show that the systems working at the frontiers of Natural Language Processing do not perform well for subtitles and require some post-processing solutions for redressal of these problems
Tasks	Machine Translation
Published	2019-09-04
URL	https://arxiv.org/abs/1909.05362v1
PDF	https://arxiv.org/pdf/1909.05362v1.pdf
PWC	https://paperswithcode.com/paper/problems-with-automating-translation-of
Repo
Framework

Falls Prediction Based on Body Keypoints and Seq2Seq Architecture


Title	Falls Prediction Based on Body Keypoints and Seq2Seq Architecture
Authors	Minjie Hua, Yibing Nan, Shiguo Lian
Abstract	This paper presents a novel approach for predicting the falls of people in advance from monocular video. First, all persons in the observed frames are detected and tracked with the coordinates of their body keypoints being extracted meanwhile. A keypoints vectorization method is exploited to eliminate irrelevant information in the initial coordinate representation. Then, the observed keypoint sequence of each person is input to the pose prediction module adapted from sequence-to-sequence(seq2seq) architecture to predict the future keypoint sequence. Finally, the predicted pose is analyzed by the falls classifier to judge whether the person will fall down in the future. The pose prediction module and falls classifier are trained separately and tuned jointly using Le2i dataset, which contains 191 videos of various normal daily activities as well as falls performed by several actors. The contrast experiments with mainstream raw RGB-based models show the accuracy improvement of utilizing body keypoints in falls classification. Moreover, the precognition of falls is proved effective by comparisons between models that with and without the pose prediction module.
Tasks	Pose Prediction
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00275v2
PDF	https://arxiv.org/pdf/1908.00275v2.pdf
PWC	https://paperswithcode.com/paper/falls-prediction-based-on-body-keypoints-and
Repo
Framework

Style transfer-based image synthesis as an efficient regularization technique in deep learning


Title	Style transfer-based image synthesis as an efficient regularization technique in deep learning
Authors	Agnieszka Mikołajczyk, Michał Grochowski
Abstract	These days deep learning is the fastest-growing area in the field of Machine Learning. Convolutional Neural Networks are currently the main tool used for image analysis and classification purposes. Although great achievements and perspectives, deep neural networks and accompanying learning algorithms have some relevant challenges to tackle. In this paper, we have focused on the most frequently mentioned problem in the field of machine learning, that is relatively poor generalization abilities. Partial remedies for this are regularization techniques e.g. dropout, batch normalization, weight decay, transfer learning, early stopping and data augmentation. In this paper, we have focused on data augmentation. We propose to use a method based on a neural style transfer, which allows generating new unlabeled images of a high perceptual quality that combine the content of a base image with the appearance of another one. In a proposed approach, the newly created images are described with pseudo-labels, and then used as a training dataset. Real, labeled images are divided into the validation and test set. We validated the proposed method on a challenging skin lesion classification case study. Four representative neural architectures are examined. Obtained results show the strong potential of the proposed approach.
Tasks	Data Augmentation, Image Generation, Skin Lesion Classification, Style Transfer, Transfer Learning
Published	2019-05-27
URL	https://arxiv.org/abs/1905.10974v1
PDF	https://arxiv.org/pdf/1905.10974v1.pdf
PWC	https://paperswithcode.com/paper/style-transfer-based-image-synthesis-as-an
Repo
Framework

RTOP: A Conceptual and Computational Framework for General Intelligence


Title	RTOP: A Conceptual and Computational Framework for General Intelligence
Authors	Shilpesh Garg
Abstract	A novel general intelligence model is proposed with three types of learning. A unified sequence of the foreground percept trace and the command trace translates into direct and time-hop observation paths to form the basis of Raw learning. Raw learning includes the formation of image-image associations, which lead to the perception of temporal and spatial relationships among objects and object parts; and the formation of image-audio associations, which serve as the building blocks of language. Offline identification of similar segments in the observation paths and their subsequent reduction into a common segment through merging of memory nodes leads to Generalized learning. Generalization includes the formation of interpolated sensory nodes for robust and generic matching, the formation of sensory properties nodes for specific matching and superimposition, and the formation of group nodes for simpler logic pathways. Online superimposition of memory nodes across multiple predictions, primarily the superimposition of images on the internal projection canvas, gives rise to Innovative learning and thought. The learning of actions happens the same way as raw learning while the action determination happens through the utility model built into the raw learnings, the utility function being the pleasure and pain of the physical senses.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10393v2
PDF	https://arxiv.org/pdf/1910.10393v2.pdf
PWC	https://paperswithcode.com/paper/rtop-a-conceptual-and-computational-framework
Repo
Framework

Unimodal-uniform Constrained Wasserstein Training for Medical Diagnosis


Title	Unimodal-uniform Constrained Wasserstein Training for Medical Diagnosis
Authors	Xiaofeng Liu, Xu Han, Yukai Qiao, Yi Ge, Lu Jun
Abstract	The labels in medical diagnosis task are usually discrete and successively distributed. For example, the Diabetic Retinopathy Diagnosis (DR) involves five health risk levels: no DR (0), mild DR (1), moderate DR (2), severe DR (3) and proliferative DR (4). This labeling system is common for medical disease. Previous methods usually construct a multi-binary-classification task or propose some re-parameter schemes in the output unit. In this paper, we target on this task from the perspective of loss function. More specifically, the Wasserstein distance is utilized as an alternative, explicitly incorporating the inter-class correlations by pre-defining its ground metric. Then, the ground metric which serves as a linear, convex or concave increasing function w.r.t. the Euclidean distance in a line is explored from an optimization perspective. Meanwhile, this paper also proposes of constructing the smoothed target labels that model the inlier and outlier noises by using a unimodal-uniform mixture distribution. Different from the one-hot setting, the smoothed label endues the computation of Wasserstein distance with more challenging features. With either one-hot or smoothed target label, this paper systematically concludes the practical closed-form solution. We evaluate our method on several medical diagnosis tasks (e.g., Diabetic Retinopathy and Ultrasound Breast dataset) and achieve state-of-the-art performance.
Tasks	Medical Diagnosis
Published	2019-11-03
URL	https://arxiv.org/abs/1911.02475v1
PDF	https://arxiv.org/pdf/1911.02475v1.pdf
PWC	https://paperswithcode.com/paper/unimodal-uniform-constrained-wasserstein
Repo
Framework

Inference of visual field test performance from OCT volumes using deep learning


Title	Inference of visual field test performance from OCT volumes using deep learning
Authors	Stefan Maetschke, Bhavna Antony, Hiroshi Ishikawa, Gadi Wollstein, Joel Schuman, Rahil Garnavi
Abstract	Visual field tests (VFT) are pivotal for glaucoma diagnosis and conducted regularly to monitor disease progression. Here we address the question to what degree aggregate VFT measurements such as Visual Field Index (VFI) and Mean Deviation (MD) can be inferred from Optical Coherence Tomography (OCT) scans of the Optic Nerve Head (ONH) or the macula. Accurate inference of VFT measurements from OCT could reduce examination time and cost. We propose a novel 3D Convolutional Neural Network (CNN) for this task and compare its accuracy with classical machine learning (ML) algorithms trained on common, segmentation-based OCT, features employed for glaucoma diagnostics. Peak accuracies were achieved on ONH scans when inferring VFI with a Pearson Correlation (PC) of 0.88$\pm$0.035 for the CNN and a significantly lower (p $<$ 0.01) PC of 0.74$\pm$0.090 for the best performing, classical ML algorithm - a Random Forest regressor. Estimation of MD was equally accurate with a PC of 0.88$\pm$0.023 on ONH scans for the CNN.
Tasks
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01428v3
PDF	https://arxiv.org/pdf/1908.01428v3.pdf
PWC	https://paperswithcode.com/paper/inference-of-visual-field-test-performance
Repo
Framework

A New Compensatory Genetic Algorithm-Based Method for Effective Compressed Multi-function Convolutional Neural Network Model Selection with Multi-Objective Optimization


Title	A New Compensatory Genetic Algorithm-Based Method for Effective Compressed Multi-function Convolutional Neural Network Model Selection with Multi-Objective Optimization
Authors	Luna M. Zhang
Abstract	In recent years, there have been many popular Convolutional Neural Networks (CNNs), such as Google’s Inception-V4, that have performed very well for various image classification problems. These commonly used CNN models usually use the same activation function, such as RELU, for all neurons in the convolutional layers; they are “Single-function CNNs.” However, SCNNs may not always be optimal. Thus, a “Multi-function CNN” (MCNN), which uses different activation functions for different neurons, has been shown to outperform a SCNN. Also, CNNs typically have very large architectures that use a lot of memory and need a lot of data in order to be trained well. As a result, they tend to have very high training and prediction times too. An important research problem is how to automatically and efficiently find the best CNN with both high classification performance and compact architecture with high training and prediction speeds, small power usage, and small memory size for any image classification problem. It is very useful to intelligently find an effective, fast, energy-efficient, and memory-efficient “Compressed Multi-function CNN” (CMCNN) from a large number of candidate MCNNs. A new compensatory algorithm using a new genetic algorithm (GA) is created to find the best CMCNN with an ideal compensation between performance and architecture size. The optimal CMCNN has the best performance and the smallest architecture size. Simulations using the CIFAR10 dataset showed that the new compensatory algorithm could find CMCNNs that could outperform non-compressed MCNNs in terms of classification performance (F1-score), speed, power usage, and memory usage. Other effective, fast, power-efficient, and memory-efficient CMCNNs based on popular CNN architectures will be developed for image classification problems in important real-world applications, such as brain informatics and biomedical imaging.
Tasks	Image Classification, Model Selection
Published	2019-06-08
URL	https://arxiv.org/abs/1906.11912v1
PDF	https://arxiv.org/pdf/1906.11912v1.pdf
PWC	https://paperswithcode.com/paper/a-new-compensatory-genetic-algorithm-based
Repo
Framework

Sinusoidal wave generating network based on adversarial learning and its application: synthesizing frog sounds for data augmentation


Title	Sinusoidal wave generating network based on adversarial learning and its application: synthesizing frog sounds for data augmentation
Authors	Sangwook Park, David K. Han, Hanseok Ko
Abstract	Simulators that generate observations based on theoretical models can be important tools for development, prediction, and assessment of signal processing algorithms. In order to design these simulators, painstaking effort is required to construct mathematical models according to their application. Complex models are sometimes necessary to represent a variety of real phenomena. In contrast, obtaining synthetic observations from generative models developed from real observations often require much less effort. This paper proposes a generative model based on adversarial learning. Given that observations are typically signals composed of a linear combination of sinusoidal waves and random noises, sinusoidal wave generating networks are first designed based on an adversarial network. Audio waveform generation can then be performed using the proposed network. Several approaches to designing the objective function of the proposed network using adversarial learning are investigated experimentally. In addition, amphibian sound classification is performed using a convolutional neural network trained with real and synthetic sounds. Both qualitative and quantitative results show that the proposed generative model makes realistic signals and is very helpful for data augmentation and data analysis.
Tasks	Data Augmentation
Published	2019-01-07
URL	http://arxiv.org/abs/1901.02050v1
PDF	http://arxiv.org/pdf/1901.02050v1.pdf
PWC	https://paperswithcode.com/paper/sinusoidal-wave-generating-network-based-on
Repo
Framework

Towards a framework for the evolution of artificial general intelligence


Title	Towards a framework for the evolution of artificial general intelligence
Authors	Sidney Pontes-Filho, Stefano Nichele
Abstract	In this work, a novel framework for the emergence of general intelligence is proposed, where agents evolve through environmental rewards and learn throughout their lifetime without supervision, i.e., self-supervised learning through embodiment. The chosen control mechanism for agents is a biologically plausible neuron model based on spiking neural networks. Network topologies become more complex through evolution, i.e., the topology is not fixed, while the synaptic weights of the networks cannot be inherited, i.e., newborn brains are not trained and have no innate knowledge of the environment. What is subject to the evolutionary process is the network topology, the type of neurons, and the type of learning. This process ensures that controllers that are passed through the generations have the intrinsic ability to learn and adapt during their lifetime in mutable environments. We envision that the described approach may lead to the emergence of the simplest form of artificial general intelligence.
Tasks
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10410v3
PDF	http://arxiv.org/pdf/1903.10410v3.pdf
PWC	https://paperswithcode.com/paper/towards-a-framework-for-the-evolution-of
Repo
Framework


Title	Opinion shaping in social networks using reinforcement learning
Authors	Vivek Borkar, Alexandre Reiffers-Masson
Abstract	In this paper, we study how to shape opinions in social networks when the matrix of interactions is unknown. We consider classical opinion dynamics with some stubborn agents and the possibility of continuously influencing the opinions of a few selected agents, albeit under resource constraints. We map the opinion dynamics to a value iteration scheme for policy evaluation for a specific stochastic shortest path problem. This leads to a representation of the opinion vector as an approximate value function for a stochastic shortest path problem with some non-classical constraints. We suggest two possible ways of influencing agents. One leads to a convex optimization problem and the other to a non-convex one. Firstly, for both problems, we propose two different online two-time scale reinforcement learning schemes that converge to the optimal solution of each problem. Secondly, we suggest stochastic gradient descent schemes and compare these classes of algorithms with the two-time scale reinforcement learning schemes. Thirdly, we also derive another algorithm designed to tackle the curse of dimensionality one faces when all agents are observed. Numerical studies are provided to illustrate the convergence and efficiency of our algorithms.
Tasks
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08802v1
PDF	https://arxiv.org/pdf/1910.08802v1.pdf
PWC	https://paperswithcode.com/paper/opinion-shaping-in-social-networks-using
Repo
Framework

Making the Cut: A Bandit-based Approach to Tiered Interviewing


Title	Making the Cut: A Bandit-based Approach to Tiered Interviewing
Authors	Candice Schumann, Zhi Lang, Jeffrey S. Foster, John P. Dickerson
Abstract	Given a huge set of applicants, how should a firm allocate sequential resume screenings, phone interviews, and in-person site visits? In a tiered interview process, later stages (e.g., in-person visits) are more informative, but also more expensive than earlier stages (e.g., resume screenings). Using accepted hiring models and the concept of structured interviews, a best practice in human resources, we cast tiered hiring as a combinatorial pure exploration (CPE) problem in the stochastic multi-armed bandit setting. The goal is to select a subset of arms (in our case, applicants) with some combinatorial structure. We present new algorithms in both the probably approximately correct (PAC) and fixed-budget settings that select a near-optimal cohort with provable guarantees. We show via simulations on real data from one of the largest US-based computer science graduate programs that our algorithms make better hiring decisions or use less budget than the status quo.
Tasks
Published	2019-06-23
URL	https://arxiv.org/abs/1906.09621v2
PDF	https://arxiv.org/pdf/1906.09621v2.pdf
PWC	https://paperswithcode.com/paper/making-the-cut-a-bandit-based-approach-to
Repo
Framework

Deep Contextualized Biomedical Abbreviation Expansion


Title	Deep Contextualized Biomedical Abbreviation Expansion
Authors	Qiao Jin, Jinling Liu, Xinghua Lu
Abstract	Automatic identification and expansion of ambiguous abbreviations are essential for biomedical natural language processing applications, such as information retrieval and question answering systems. In this paper, we present DEep Contextualized Biomedical. Abbreviation Expansion (DECBAE) model. DECBAE automatically collects substantial and relatively clean annotated contexts for 950 ambiguous abbreviations from PubMed abstracts using a simple heuristic. Then it utilizes BioELMo to extract the contextualized features of words, and feed those features to abbreviation-specific bidirectional LSTMs, where the hidden states of the ambiguous abbreviations are used to assign the exact definitions. Our DECBAE model outperforms other baselines by large margins, achieving average accuracy of 0.961 and macro-F1 of 0.917 on the dataset. It also surpasses human performance for expanding a sample abbreviation, and remains robust in imbalanced, low-resources and clinical settings.
Tasks	Information Retrieval, Question Answering
Published	2019-06-08
URL	https://arxiv.org/abs/1906.03360v1
PDF	https://arxiv.org/pdf/1906.03360v1.pdf
PWC	https://paperswithcode.com/paper/deep-contextualized-biomedical-abbreviation
Repo
Framework