October 17, 2019

3203 words 16 mins read

Paper Group ANR 683

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together. Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation. Group-Attention Single-Shot Detector (GA-SSD): Finding Pulmonary Nodules in Large-Scale CT Images. Adaptive View Planning for Aerial 3D Reconstruction. Beliefs in Decision-Maki …

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together


Title	Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Authors	Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
Abstract	Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using a vector to measure pairwise dependency, but this requires to expand the alignment matrix to a tensor, which results in memory and computation bottlenecks. In this paper, we propose a novel attention mechanism called “Multi-mask Tensorized Self-Attention” (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token) and global (source2token) dependencies by a novel compatibility function composed of dot-product and additive attentions, 2) uses a tensor to represent the feature-wise alignment scores for better expressive power but only requires parallelizable matrix multiplications, and 3) combines multi-head with multi-dimensional attentions, and applies a distinct positional mask to each head (subspace), so the memory and computation can be distributed to multiple heads, each with sequential information encoded independently. The experiments show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or competitive performance on nine NLP benchmarks with compelling memory- and time-efficiency.
Tasks
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00912v4
PDF	http://arxiv.org/pdf/1805.00912v4.pdf
PWC	https://paperswithcode.com/paper/fast-directional-self-attention-mechanism
Repo
Framework

Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation


Title	Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation
Authors	Pablo Samuel Castro, Maria Attarian
Abstract	The use of language models for generating lyrics and poetry has received an increased interest in the last few years. They pose a unique challenge relative to standard natural language problems, as their ultimate purpose is reative, notions of accuracy and reproducibility are secondary to notions of lyricism, structure, and diversity. In this creative setting, traditional quantitative measures for natural language problems, such as BLEU scores, prove inadequate: a high-scoring model may either fail to produce output respecting the desired structure (e.g. song verses), be a terribly boring creative companion, or both. In this work we propose a mechanism for combining two separately trained language models into a framework that is able to produce output respecting the desired song structure, while providing a richness and diversity of vocabulary that renders it more creatively appealing.
Tasks
Published	2018-11-12
URL	http://arxiv.org/abs/1811.04651v1
PDF	http://arxiv.org/pdf/1811.04651v1.pdf
PWC	https://paperswithcode.com/paper/combining-learned-lyrical-structures-and
Repo
Framework

Group-Attention Single-Shot Detector (GA-SSD): Finding Pulmonary Nodules in Large-Scale CT Images


Title	Group-Attention Single-Shot Detector (GA-SSD): Finding Pulmonary Nodules in Large-Scale CT Images
Authors	Jiechao Ma, Xiang Li, Hongwei Li, Bjoern H Menze, Sen Liang, Rongguo Zhang, Wei-Shi Zheng
Abstract	Early diagnosis of pulmonary nodules (PNs) can improve the survival rate of patients and yet is a challenging task for radiologists due to the image noise and artifacts in computed tomography (CT) images. In this paper, we propose a novel and effective abnormality detector implementing the attention mechanism and group convolution on 3D single-shot detector (SSD) called group-attention SSD (GA-SSD). We find that group convolution is effective in extracting rich context information between continuous slices, and attention network can learn the target features automatically. We collected a large-scale dataset that contained 4146 CT scans with annotations of varying types and sizes of PNs (even PNs smaller than 3mm were annotated). To the best of our knowledge, this dataset is the largest cohort with relatively complete annotations for PNs detection. Our experimental results show that the proposed group-attention SSD outperforms the classic SSD framework as well as the state-of-the-art 3DCNN, especially on some challenging lesion types.
Tasks	Computed Tomography (CT), Finding Pulmonary Nodules In Large-Scale Ct Images
Published	2018-12-18
URL	https://arxiv.org/abs/1812.07166v2
PDF	https://arxiv.org/pdf/1812.07166v2.pdf
PWC	https://paperswithcode.com/paper/group-attention-single-shot-detector-ga-ssd
Repo
Framework

Adaptive View Planning for Aerial 3D Reconstruction


Title	Adaptive View Planning for Aerial 3D Reconstruction
Authors	Cheng Peng, Volkan Isler
Abstract	With the proliferation of small aerial vehicles, acquiring close up aerial imagery for high quality reconstruction of complex scenes is gaining importance. We present an adaptive view planning method to collect such images in an automated fashion. We start by sampling a small set of views to build a coarse proxy to the scene. We then present (i)~a method that builds a view manifold for view selection, and (ii) an algorithm to select a sparse set of views. The vehicle then visits these viewpoints to cover the scene, and the procedure is repeated until reconstruction quality converges or a desired level of quality is achieved. The view manifold provides an effective efficiency/quality compromise between using the entire 6 degree of freedom pose space and using a single view hemisphere to select the views. Our results show that, in contrast to existing “explore and exploit” methods which collect only two sets of views, reconstruction quality can be drastically improved by adding a third set. They also indicate that three rounds of data collection is sufficient even for very complex scenes. We compare our algorithm to existing methods in three challenging scenes. We require each algorithm to select the same number of views. Our algorithm generates views which produce the least reconstruction error.
Tasks	3D Reconstruction
Published	2018-05-01
URL	https://arxiv.org/abs/1805.00506v2
PDF	https://arxiv.org/pdf/1805.00506v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-view-planning-for-aerial-3d
Repo
Framework

Beliefs in Decision-Making Cascades


Title	Beliefs in Decision-Making Cascades
Authors	Daewon Seo, Ravi Kiran Raman, Joong Bum Rhim, Vivek K Goyal, Lav R Varshney
Abstract	This work explores a social learning problem with agents having nonidentical noise variances and mismatched beliefs. We consider an $N$-agent binary hypothesis test in which each agent sequentially makes a decision based not only on a private observation, but also on preceding agents’ decisions. In addition, the agents have their own beliefs instead of the true prior, and have nonidentical noise variances in the private signal. We focus on the Bayes risk of the last agent, where preceding agents are selfish. We first derive the optimal decision rule by recursive belief update and conclude, counterintuitively, that beliefs deviating from the true prior could be optimal in this setting. The effect of nonidentical noise levels in the two-agent case is also considered and analytical properties of the optimal belief curves are given. Next, we consider a predecessor selection problem wherein the subsequent agent of a certain belief chooses a predecessor from a set of candidates with varying beliefs. We characterize the decision region for choosing such a predecessor and argue that a subsequent agent with beliefs varying from the true prior often ends up selecting a suboptimal predecessor, indicating the need for a social planner. Lastly, we discuss an augmented intelligence design problem that uses a model of human behavior from cumulative prospect theory and investigate its near-optimality and suboptimality.
Tasks	Decision Making
Published	2018-11-23
URL	https://arxiv.org/abs/1812.04419v2
PDF	https://arxiv.org/pdf/1812.04419v2.pdf
PWC	https://paperswithcode.com/paper/beliefs-in-decision-making-cascades
Repo
Framework

Randomized Gradient Boosting Machine


Title	Randomized Gradient Boosting Machine
Authors	Haihao Lu, Rahul Mazumder
Abstract	Gradient Boosting Machine (GBM) introduced by Friedman is an extremely powerful supervised learning algorithm that is widely used in practice — it routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In spite of the usefulness of GBM in practice, our current theoretical understanding of this method is rather limited. In this work, we propose Randomized Gradient Boosting Machine (RGBM) which leads to significant computational gains compared to GBM, by using a randomization scheme to reduce the search in the space of weak-learners. We derive novel computational guarantees for RGBM. We also provide a principled guideline towards better step-size selection in RGBM that does not require a line search. Our proposed framework is inspired by a special variant of coordinate descent that combines the benefits of randomized coordinate descent and greedy coordinate descent; and may be of independent interest as an optimization algorithm. As a special case, our results for RGBM lead to superior computational guarantees for GBM. Our computational guarantees depend upon a curious geometric quantity that we call Minimal Cosine Angle, which relates to the density of weak-learners in the prediction space. On a series of numerical experiments on real datasets, we demonstrate the effectiveness of RGBM over GBM in terms of obtaining a model with good training and/or testing data fidelity with a fraction of the computational cost.
Tasks
Published	2018-10-24
URL	http://arxiv.org/abs/1810.10158v2
PDF	http://arxiv.org/pdf/1810.10158v2.pdf
PWC	https://paperswithcode.com/paper/randomized-gradient-boosting-machine
Repo
Framework

Causal Inference on Discrete Data via Estimating Distance Correlations


Title	Causal Inference on Discrete Data via Estimating Distance Correlations
Authors	Furui Liu, Laiwan Chan
Abstract	In this paper, we deal with the problem of inferring causal directions when the data is on discrete domain. By considering the distribution of the cause $P(X)$ and the conditional distribution mapping cause to effect $P(YX)$ as independent random variables, we propose to infer the causal direction via comparing the distance correlation between $P(X)$ and $P(YX)$ with the distance correlation between $P(Y)$ and $P(XY)$. We infer “$X$ causes $Y$” if the dependence coefficient between $P(X)$ and $P(YX)$ is smaller. Experiments are performed to show the performance of the proposed method.
Tasks	Causal Inference
Published	2018-03-21
URL	http://arxiv.org/abs/1803.07712v3
PDF	http://arxiv.org/pdf/1803.07712v3.pdf
PWC	https://paperswithcode.com/paper/causal-inference-on-discrete-data-via
Repo
Framework

Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers


Title	Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers
Authors	Yutong Ban, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
Abstract	In this paper we address the problem of tracking multiple speakers via the fusion of visual and auditory information. We propose to exploit the complementary nature of these two modalities in order to accurately estimate smooth trajectories of the tracked persons, to deal with the partial or total absence of one of the modalities over short periods of time, and to estimate the acoustic status – either speaking or silent – of each tracked person along time. We propose to cast the problem at hand into a generative audio-visual fusion (or association) model formulated as a latent-variable temporal graphical model. This may well be viewed as the problem of maximizing the posterior joint distribution of a set of continuous and discrete latent variables given the past and current observations, which is intractable. We propose a variational inference model which amounts to approximate the joint distribution with a factorized distribution. The solution takes the form of a closed-form expectation maximization procedure. We describe in detail the inference algorithm, we evaluate its performance and we compare it with several baseline methods. These experiments show that the proposed audio-visual tracker performs well in informal meetings involving a time-varying number of people.
Tasks	Bayesian Inference, Visual Tracking
Published	2018-09-28
URL	https://arxiv.org/abs/1809.10961v2
PDF	https://arxiv.org/pdf/1809.10961v2.pdf
PWC	https://paperswithcode.com/paper/variational-bayesian-inference-for-audio
Repo
Framework

Learning Weighted Representations for Generalization Across Designs


Title	Learning Weighted Representations for Generalization Across Designs
Authors	Fredrik D. Johansson, Nathan Kallus, Uri Shalit, David Sontag
Abstract	Predictive models that generalize well under distributional shift are often desirable and sometimes crucial to building robust and reliable machine learning applications. We focus on distributional shift that arises in causal inference from observational data and in unsupervised domain adaptation. We pose both of these problems as prediction under a shift in design. Popular methods for overcoming distributional shift make unrealistic assumptions such as having a well-specified model or knowing the policy that gave rise to the observed data. Other methods are hindered by their need for a pre-specified metric for comparing observations, or by poor asymptotic properties. We devise a bound on the generalization error under design shift, incorporating both representation learning and sample re-weighting. Based on the bound, we propose an algorithmic framework that does not require any of the above assumptions and which is asymptotically consistent. We empirically study the new framework using two synthetic datasets, and demonstrate its effectiveness compared to previous methods.
Tasks	Causal Inference, Domain Adaptation, Representation Learning, Unsupervised Domain Adaptation
Published	2018-02-23
URL	http://arxiv.org/abs/1802.08598v2
PDF	http://arxiv.org/pdf/1802.08598v2.pdf
PWC	https://paperswithcode.com/paper/learning-weighted-representations-for
Repo
Framework

Closing the loop on multisensory interactions: A neural architecture for multisensory causal inference and recalibration


Title	Closing the loop on multisensory interactions: A neural architecture for multisensory causal inference and recalibration
Authors	Jonathan Tong, German I. Parisi, Stefan Wermter, Brigitte Röder
Abstract	When the brain receives input from multiple sensory systems, it is faced with the question of whether it is appropriate to process the inputs in combination, as if they originated from the same event, or separately, as if they originated from distinct events. Furthermore, it must also have a mechanism through which it can keep sensory inputs calibrated to maintain the accuracy of its internal representations. We have developed a neural network architecture capable of i) approximating optimal multisensory spatial integration, based on Bayesian causal inference, and ii) recalibrating the spatial encoding of sensory systems. The architecture is based on features of the dorsal processing hierarchy, including the spatial tuning properties of unisensory neurons and the convergence of different sensory inputs onto multisensory neurons. Furthermore, we propose that these unisensory and multisensory neurons play dual roles in i) encoding spatial location as separate or integrated estimates and ii) accumulating evidence for the independence or relatedness of multisensory stimuli. We further propose that top-down feedback connections spanning the dorsal pathway play key a role in recalibrating spatial encoding at the level of early unisensory cortices. Our proposed architecture provides possible explanations for a number of human electrophysiological and neuroimaging results and generates testable predictions linking neurophysiology with behaviour.
Tasks	Causal Inference
Published	2018-02-19
URL	http://arxiv.org/abs/1802.06591v3
PDF	http://arxiv.org/pdf/1802.06591v3.pdf
PWC	https://paperswithcode.com/paper/closing-the-loop-on-multisensory-interactions
Repo
Framework

Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?


Title	Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?
Authors	Lena Reed, Shereen Oraby, Marilyn Walker
Abstract	Responses in task-oriented dialogue systems often realize multiple propositions whose ultimate form depends on the use of sentence planning and discourse structuring operations. For example a recommendation may consist of an explicitly evaluative utterance e.g. Chanpen Thai is the best option, along with content related by the justification discourse relation, e.g. It has great food and service, that combines multiple propositions into a single phrase. While neural generation methods integrate sentence planning and surface realization in one end-to-end learning framework, previous work has not shown that neural generators can: (1) perform common sentence planning and discourse structuring operations; (2) make decisions as to whether to realize content in a single sentence or over multiple sentences; (3) generalize sentence planning and discourse relation operations beyond what was seen in training. We systematically create large training corpora that exhibit particular sentence planning operations and then test neural models to see what they learn. We compare models without explicit latent variables for sentence planning with ones that provide explicit supervision during training. We show that only the models with additional supervision can reproduce sentence planing and discourse operations and generalize to situations unseen in training.
Tasks	Task-Oriented Dialogue Systems
Published	2018-09-09
URL	http://arxiv.org/abs/1809.03015v2
PDF	http://arxiv.org/pdf/1809.03015v2.pdf
PWC	https://paperswithcode.com/paper/can-neural-generators-for-dialogue-learn
Repo
Framework

Data Science as Political Action: Grounding Data Science in a Politics of Justice


Title	Data Science as Political Action: Grounding Data Science in a Politics of Justice
Authors	Ben Green
Abstract	In response to recent controversies, the field of data science has rushed to adopt codes of ethics. Such professional codes, however, are ill-equipped to address broad matters of social justice. Instead of ethics codes, I argue, the field must embrace politics. Data scientists must recognize themselves as political actors engaged in normative constructions of society and, as befits political work, evaluate their work according to its downstream material impacts on people’s lives. I justify this notion in two parts: first, by articulating why data scientists must recognize themselves as political actors, and second, by describing how the field can evolve toward a deliberative and rigorous grounding in a politics of social justice. Part 1 responds to three common arguments that have been invoked by data scientists when they are challenged to take political positions regarding their work. In confronting these arguments, I will demonstrate why attempting to remain apolitical is itself a political stance–a fundamentally conservative one–and why the field’s current attempts to promote “social good” dangerously rely on vague and unarticulated political assumptions. Part 2 proposes a framework for what a politically-engaged data science could look like and how to achieve it, recognizing the challenge of reforming the field in this manner. I conceptualize the process of incorporating politics into data science in four stages: becoming interested in directly addressing social issues, recognizing the politics underlying these issues, redirecting existing methods toward new applications, and, finally, developing new practices and methods that orient data science around a mission of social justice. The path ahead does not require data scientists to abandon their technical expertise, but it does entail expanding their notions of what problems to work on and how to engage with society.
Tasks
Published	2018-11-06
URL	http://arxiv.org/abs/1811.03435v2
PDF	http://arxiv.org/pdf/1811.03435v2.pdf
PWC	https://paperswithcode.com/paper/data-science-as-political-action-grounding
Repo
Framework

Computational ghost imaging using a field-programmable gate array


Title	Computational ghost imaging using a field-programmable gate array
Authors	Ikuo Hoshi, Tomoyoshi Shimobaba, Takashi Kakue, Tomoyoshi Ito
Abstract	Computational ghost imaging is a promising technique for single-pixel imaging because it is robust to disturbance and can be operated over broad wavelength bands, unlike common cameras. However, one disadvantage of this method is that it has a long calculation time for image reconstruction. In this paper, we have designed a dedicated calculation circuit that accelerated the process of computational ghost imaging. We implemented this circuit by using a field-programmable gate array, which reduced the calculation time for the circuit compared to a CPU. The dedicated circuit reconstructs images at a frame rate of 300 Hz.
Tasks	Image Reconstruction
Published	2018-10-10
URL	http://arxiv.org/abs/1810.05670v1
PDF	http://arxiv.org/pdf/1810.05670v1.pdf
PWC	https://paperswithcode.com/paper/computational-ghost-imaging-using-a-field
Repo
Framework

Frank-Wolfe Algorithm for the Exact Sparse Problem


Title	Frank-Wolfe Algorithm for the Exact Sparse Problem
Authors	Farah Cherfaoui, Valentin Emiya, Liva Ralaivola, Sandrine Anthoine
Abstract	In this paper, we study the properties of the Frank-Wolfe algorithm to solve the \ExactSparse reconstruction problem. We prove that when the dictionary is quasi-incoherent, at each iteration, the Frank-Wolfe algorithm picks up an atom indexed by the support. We also prove that when the dictionary is quasi-incoherent, there exists an iteration beyond which the algorithm converges exponentially fast.
Tasks
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07201v1
PDF	http://arxiv.org/pdf/1812.07201v1.pdf
PWC	https://paperswithcode.com/paper/frank-wolfe-algorithm-for-the-exact-sparse
Repo
Framework

Deep Learning-Based Decoding for Constrained Sequence Codes


Title	Deep Learning-Based Decoding for Constrained Sequence Codes
Authors	Congzhe Cao, Duanshun Li, Ivan Fair
Abstract	Constrained sequence codes have been widely used in modern communication and data storage systems. Sequences encoded with constrained sequence codes satisfy constraints imposed by the physical channel, hence enabling efficient and reliable transmission of coded symbols. Traditional encoding and decoding of constrained sequence codes rely on table look-up, which is prone to errors that occur during transmission. In this paper, we introduce constrained sequence decoding based on deep learning. With multiple layer perception (MLP) networks and convolutional neural networks (CNNs), we are able to achieve low bit error rates that are close to maximum a posteriori probability (MAP) decoding as well as improve the system throughput. Moreover, implementation of capacity-achieving fixed-length codes, where the complexity is prohibitively high with table look-up decoding, becomes practical with deep learning-based decoding.
Tasks
Published	2018-09-06
URL	http://arxiv.org/abs/1809.01859v1
PDF	http://arxiv.org/pdf/1809.01859v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-decoding-for-constrained
Repo
Framework