Paper Group ANR 683
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together. Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation. Group-Attention Single-Shot Detector (GA-SSD): Finding Pulmonary Nodules in Large-Scale CT Images. Adaptive View Planning for Aerial 3D Reconstruction. Beliefs in Decision-Maki …
Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together
Title | Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together |
Authors | Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang |
Abstract | Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using a vector to measure pairwise dependency, but this requires to expand the alignment matrix to a tensor, which results in memory and computation bottlenecks. In this paper, we propose a novel attention mechanism called “Multi-mask Tensorized Self-Attention” (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token) and global (source2token) dependencies by a novel compatibility function composed of dot-product and additive attentions, 2) uses a tensor to represent the feature-wise alignment scores for better expressive power but only requires parallelizable matrix multiplications, and 3) combines multi-head with multi-dimensional attentions, and applies a distinct positional mask to each head (subspace), so the memory and computation can be distributed to multiple heads, each with sequential information encoded independently. The experiments show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or competitive performance on nine NLP benchmarks with compelling memory- and time-efficiency. |
Tasks | |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00912v4 |
http://arxiv.org/pdf/1805.00912v4.pdf | |
PWC | https://paperswithcode.com/paper/fast-directional-self-attention-mechanism |
Repo | |
Framework | |
Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation
Title | Combining Learned Lyrical Structures and Vocabulary for Improved Lyric Generation |
Authors | Pablo Samuel Castro, Maria Attarian |
Abstract | The use of language models for generating lyrics and poetry has received an increased interest in the last few years. They pose a unique challenge relative to standard natural language problems, as their ultimate purpose is reative, notions of accuracy and reproducibility are secondary to notions of lyricism, structure, and diversity. In this creative setting, traditional quantitative measures for natural language problems, such as BLEU scores, prove inadequate: a high-scoring model may either fail to produce output respecting the desired structure (e.g. song verses), be a terribly boring creative companion, or both. In this work we propose a mechanism for combining two separately trained language models into a framework that is able to produce output respecting the desired song structure, while providing a richness and diversity of vocabulary that renders it more creatively appealing. |
Tasks | |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04651v1 |
http://arxiv.org/pdf/1811.04651v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-learned-lyrical-structures-and |
Repo | |
Framework | |
Group-Attention Single-Shot Detector (GA-SSD): Finding Pulmonary Nodules in Large-Scale CT Images
Title | Group-Attention Single-Shot Detector (GA-SSD): Finding Pulmonary Nodules in Large-Scale CT Images |
Authors | Jiechao Ma, Xiang Li, Hongwei Li, Bjoern H Menze, Sen Liang, Rongguo Zhang, Wei-Shi Zheng |
Abstract | Early diagnosis of pulmonary nodules (PNs) can improve the survival rate of patients and yet is a challenging task for radiologists due to the image noise and artifacts in computed tomography (CT) images. In this paper, we propose a novel and effective abnormality detector implementing the attention mechanism and group convolution on 3D single-shot detector (SSD) called group-attention SSD (GA-SSD). We find that group convolution is effective in extracting rich context information between continuous slices, and attention network can learn the target features automatically. We collected a large-scale dataset that contained 4146 CT scans with annotations of varying types and sizes of PNs (even PNs smaller than 3mm were annotated). To the best of our knowledge, this dataset is the largest cohort with relatively complete annotations for PNs detection. Our experimental results show that the proposed group-attention SSD outperforms the classic SSD framework as well as the state-of-the-art 3DCNN, especially on some challenging lesion types. |
Tasks | Computed Tomography (CT), Finding Pulmonary Nodules In Large-Scale Ct Images |
Published | 2018-12-18 |
URL | https://arxiv.org/abs/1812.07166v2 |
https://arxiv.org/pdf/1812.07166v2.pdf | |
PWC | https://paperswithcode.com/paper/group-attention-single-shot-detector-ga-ssd |
Repo | |
Framework | |
Adaptive View Planning for Aerial 3D Reconstruction
Title | Adaptive View Planning for Aerial 3D Reconstruction |
Authors | Cheng Peng, Volkan Isler |
Abstract | With the proliferation of small aerial vehicles, acquiring close up aerial imagery for high quality reconstruction of complex scenes is gaining importance. We present an adaptive view planning method to collect such images in an automated fashion. We start by sampling a small set of views to build a coarse proxy to the scene. We then present (i)~a method that builds a view manifold for view selection, and (ii) an algorithm to select a sparse set of views. The vehicle then visits these viewpoints to cover the scene, and the procedure is repeated until reconstruction quality converges or a desired level of quality is achieved. The view manifold provides an effective efficiency/quality compromise between using the entire 6 degree of freedom pose space and using a single view hemisphere to select the views. Our results show that, in contrast to existing “explore and exploit” methods which collect only two sets of views, reconstruction quality can be drastically improved by adding a third set. They also indicate that three rounds of data collection is sufficient even for very complex scenes. We compare our algorithm to existing methods in three challenging scenes. We require each algorithm to select the same number of views. Our algorithm generates views which produce the least reconstruction error. |
Tasks | 3D Reconstruction |
Published | 2018-05-01 |
URL | https://arxiv.org/abs/1805.00506v2 |
https://arxiv.org/pdf/1805.00506v2.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-view-planning-for-aerial-3d |
Repo | |
Framework | |
Beliefs in Decision-Making Cascades
Title | Beliefs in Decision-Making Cascades |
Authors | Daewon Seo, Ravi Kiran Raman, Joong Bum Rhim, Vivek K Goyal, Lav R Varshney |
Abstract | This work explores a social learning problem with agents having nonidentical noise variances and mismatched beliefs. We consider an $N$-agent binary hypothesis test in which each agent sequentially makes a decision based not only on a private observation, but also on preceding agents’ decisions. In addition, the agents have their own beliefs instead of the true prior, and have nonidentical noise variances in the private signal. We focus on the Bayes risk of the last agent, where preceding agents are selfish. We first derive the optimal decision rule by recursive belief update and conclude, counterintuitively, that beliefs deviating from the true prior could be optimal in this setting. The effect of nonidentical noise levels in the two-agent case is also considered and analytical properties of the optimal belief curves are given. Next, we consider a predecessor selection problem wherein the subsequent agent of a certain belief chooses a predecessor from a set of candidates with varying beliefs. We characterize the decision region for choosing such a predecessor and argue that a subsequent agent with beliefs varying from the true prior often ends up selecting a suboptimal predecessor, indicating the need for a social planner. Lastly, we discuss an augmented intelligence design problem that uses a model of human behavior from cumulative prospect theory and investigate its near-optimality and suboptimality. |
Tasks | Decision Making |
Published | 2018-11-23 |
URL | https://arxiv.org/abs/1812.04419v2 |
https://arxiv.org/pdf/1812.04419v2.pdf | |
PWC | https://paperswithcode.com/paper/beliefs-in-decision-making-cascades |
Repo | |
Framework | |
Randomized Gradient Boosting Machine
Title | Randomized Gradient Boosting Machine |
Authors | Haihao Lu, Rahul Mazumder |
Abstract | Gradient Boosting Machine (GBM) introduced by Friedman is an extremely powerful supervised learning algorithm that is widely used in practice — it routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In spite of the usefulness of GBM in practice, our current theoretical understanding of this method is rather limited. In this work, we propose Randomized Gradient Boosting Machine (RGBM) which leads to significant computational gains compared to GBM, by using a randomization scheme to reduce the search in the space of weak-learners. We derive novel computational guarantees for RGBM. We also provide a principled guideline towards better step-size selection in RGBM that does not require a line search. Our proposed framework is inspired by a special variant of coordinate descent that combines the benefits of randomized coordinate descent and greedy coordinate descent; and may be of independent interest as an optimization algorithm. As a special case, our results for RGBM lead to superior computational guarantees for GBM. Our computational guarantees depend upon a curious geometric quantity that we call Minimal Cosine Angle, which relates to the density of weak-learners in the prediction space. On a series of numerical experiments on real datasets, we demonstrate the effectiveness of RGBM over GBM in terms of obtaining a model with good training and/or testing data fidelity with a fraction of the computational cost. |
Tasks | |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10158v2 |
http://arxiv.org/pdf/1810.10158v2.pdf | |
PWC | https://paperswithcode.com/paper/randomized-gradient-boosting-machine |
Repo | |
Framework | |
Causal Inference on Discrete Data via Estimating Distance Correlations
Title | Causal Inference on Discrete Data via Estimating Distance Correlations |
Authors | Furui Liu, Laiwan Chan |
Abstract | In this paper, we deal with the problem of inferring causal directions when the data is on discrete domain. By considering the distribution of the cause $P(X)$ and the conditional distribution mapping cause to effect $P(YX)$ as independent random variables, we propose to infer the causal direction via comparing the distance correlation between $P(X)$ and $P(YX)$ with the distance correlation between $P(Y)$ and $P(XY)$. We infer “$X$ causes $Y$” if the dependence coefficient between $P(X)$ and $P(YX)$ is smaller. Experiments are performed to show the performance of the proposed method. |
Tasks | Causal Inference |
Published | 2018-03-21 |
URL | http://arxiv.org/abs/1803.07712v3 |
http://arxiv.org/pdf/1803.07712v3.pdf | |
PWC | https://paperswithcode.com/paper/causal-inference-on-discrete-data-via |
Repo | |
Framework | |
Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers
Title | Variational Bayesian Inference for Audio-Visual Tracking of Multiple Speakers |
Authors | Yutong Ban, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud |
Abstract | In this paper we address the problem of tracking multiple speakers via the fusion of visual and auditory information. We propose to exploit the complementary nature of these two modalities in order to accurately estimate smooth trajectories of the tracked persons, to deal with the partial or total absence of one of the modalities over short periods of time, and to estimate the acoustic status – either speaking or silent – of each tracked person along time. We propose to cast the problem at hand into a generative audio-visual fusion (or association) model formulated as a latent-variable temporal graphical model. This may well be viewed as the problem of maximizing the posterior joint distribution of a set of continuous and discrete latent variables given the past and current observations, which is intractable. We propose a variational inference model which amounts to approximate the joint distribution with a factorized distribution. The solution takes the form of a closed-form expectation maximization procedure. We describe in detail the inference algorithm, we evaluate its performance and we compare it with several baseline methods. These experiments show that the proposed audio-visual tracker performs well in informal meetings involving a time-varying number of people. |
Tasks | Bayesian Inference, Visual Tracking |
Published | 2018-09-28 |
URL | https://arxiv.org/abs/1809.10961v2 |
https://arxiv.org/pdf/1809.10961v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-bayesian-inference-for-audio |
Repo | |
Framework | |
Learning Weighted Representations for Generalization Across Designs
Title | Learning Weighted Representations for Generalization Across Designs |
Authors | Fredrik D. Johansson, Nathan Kallus, Uri Shalit, David Sontag |
Abstract | Predictive models that generalize well under distributional shift are often desirable and sometimes crucial to building robust and reliable machine learning applications. We focus on distributional shift that arises in causal inference from observational data and in unsupervised domain adaptation. We pose both of these problems as prediction under a shift in design. Popular methods for overcoming distributional shift make unrealistic assumptions such as having a well-specified model or knowing the policy that gave rise to the observed data. Other methods are hindered by their need for a pre-specified metric for comparing observations, or by poor asymptotic properties. We devise a bound on the generalization error under design shift, incorporating both representation learning and sample re-weighting. Based on the bound, we propose an algorithmic framework that does not require any of the above assumptions and which is asymptotically consistent. We empirically study the new framework using two synthetic datasets, and demonstrate its effectiveness compared to previous methods. |
Tasks | Causal Inference, Domain Adaptation, Representation Learning, Unsupervised Domain Adaptation |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.08598v2 |
http://arxiv.org/pdf/1802.08598v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-weighted-representations-for |
Repo | |
Framework | |
Closing the loop on multisensory interactions: A neural architecture for multisensory causal inference and recalibration
Title | Closing the loop on multisensory interactions: A neural architecture for multisensory causal inference and recalibration |
Authors | Jonathan Tong, German I. Parisi, Stefan Wermter, Brigitte Röder |
Abstract | When the brain receives input from multiple sensory systems, it is faced with the question of whether it is appropriate to process the inputs in combination, as if they originated from the same event, or separately, as if they originated from distinct events. Furthermore, it must also have a mechanism through which it can keep sensory inputs calibrated to maintain the accuracy of its internal representations. We have developed a neural network architecture capable of i) approximating optimal multisensory spatial integration, based on Bayesian causal inference, and ii) recalibrating the spatial encoding of sensory systems. The architecture is based on features of the dorsal processing hierarchy, including the spatial tuning properties of unisensory neurons and the convergence of different sensory inputs onto multisensory neurons. Furthermore, we propose that these unisensory and multisensory neurons play dual roles in i) encoding spatial location as separate or integrated estimates and ii) accumulating evidence for the independence or relatedness of multisensory stimuli. We further propose that top-down feedback connections spanning the dorsal pathway play key a role in recalibrating spatial encoding at the level of early unisensory cortices. Our proposed architecture provides possible explanations for a number of human electrophysiological and neuroimaging results and generates testable predictions linking neurophysiology with behaviour. |
Tasks | Causal Inference |
Published | 2018-02-19 |
URL | http://arxiv.org/abs/1802.06591v3 |
http://arxiv.org/pdf/1802.06591v3.pdf | |
PWC | https://paperswithcode.com/paper/closing-the-loop-on-multisensory-interactions |
Repo | |
Framework | |
Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?
Title | Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring? |
Authors | Lena Reed, Shereen Oraby, Marilyn Walker |
Abstract | Responses in task-oriented dialogue systems often realize multiple propositions whose ultimate form depends on the use of sentence planning and discourse structuring operations. For example a recommendation may consist of an explicitly evaluative utterance e.g. Chanpen Thai is the best option, along with content related by the justification discourse relation, e.g. It has great food and service, that combines multiple propositions into a single phrase. While neural generation methods integrate sentence planning and surface realization in one end-to-end learning framework, previous work has not shown that neural generators can: (1) perform common sentence planning and discourse structuring operations; (2) make decisions as to whether to realize content in a single sentence or over multiple sentences; (3) generalize sentence planning and discourse relation operations beyond what was seen in training. We systematically create large training corpora that exhibit particular sentence planning operations and then test neural models to see what they learn. We compare models without explicit latent variables for sentence planning with ones that provide explicit supervision during training. We show that only the models with additional supervision can reproduce sentence planing and discourse operations and generalize to situations unseen in training. |
Tasks | Task-Oriented Dialogue Systems |
Published | 2018-09-09 |
URL | http://arxiv.org/abs/1809.03015v2 |
http://arxiv.org/pdf/1809.03015v2.pdf | |
PWC | https://paperswithcode.com/paper/can-neural-generators-for-dialogue-learn |
Repo | |
Framework | |
Data Science as Political Action: Grounding Data Science in a Politics of Justice
Title | Data Science as Political Action: Grounding Data Science in a Politics of Justice |
Authors | Ben Green |
Abstract | In response to recent controversies, the field of data science has rushed to adopt codes of ethics. Such professional codes, however, are ill-equipped to address broad matters of social justice. Instead of ethics codes, I argue, the field must embrace politics. Data scientists must recognize themselves as political actors engaged in normative constructions of society and, as befits political work, evaluate their work according to its downstream material impacts on people’s lives. I justify this notion in two parts: first, by articulating why data scientists must recognize themselves as political actors, and second, by describing how the field can evolve toward a deliberative and rigorous grounding in a politics of social justice. Part 1 responds to three common arguments that have been invoked by data scientists when they are challenged to take political positions regarding their work. In confronting these arguments, I will demonstrate why attempting to remain apolitical is itself a political stance–a fundamentally conservative one–and why the field’s current attempts to promote “social good” dangerously rely on vague and unarticulated political assumptions. Part 2 proposes a framework for what a politically-engaged data science could look like and how to achieve it, recognizing the challenge of reforming the field in this manner. I conceptualize the process of incorporating politics into data science in four stages: becoming interested in directly addressing social issues, recognizing the politics underlying these issues, redirecting existing methods toward new applications, and, finally, developing new practices and methods that orient data science around a mission of social justice. The path ahead does not require data scientists to abandon their technical expertise, but it does entail expanding their notions of what problems to work on and how to engage with society. |
Tasks | |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.03435v2 |
http://arxiv.org/pdf/1811.03435v2.pdf | |
PWC | https://paperswithcode.com/paper/data-science-as-political-action-grounding |
Repo | |
Framework | |
Computational ghost imaging using a field-programmable gate array
Title | Computational ghost imaging using a field-programmable gate array |
Authors | Ikuo Hoshi, Tomoyoshi Shimobaba, Takashi Kakue, Tomoyoshi Ito |
Abstract | Computational ghost imaging is a promising technique for single-pixel imaging because it is robust to disturbance and can be operated over broad wavelength bands, unlike common cameras. However, one disadvantage of this method is that it has a long calculation time for image reconstruction. In this paper, we have designed a dedicated calculation circuit that accelerated the process of computational ghost imaging. We implemented this circuit by using a field-programmable gate array, which reduced the calculation time for the circuit compared to a CPU. The dedicated circuit reconstructs images at a frame rate of 300 Hz. |
Tasks | Image Reconstruction |
Published | 2018-10-10 |
URL | http://arxiv.org/abs/1810.05670v1 |
http://arxiv.org/pdf/1810.05670v1.pdf | |
PWC | https://paperswithcode.com/paper/computational-ghost-imaging-using-a-field |
Repo | |
Framework | |
Frank-Wolfe Algorithm for the Exact Sparse Problem
Title | Frank-Wolfe Algorithm for the Exact Sparse Problem |
Authors | Farah Cherfaoui, Valentin Emiya, Liva Ralaivola, Sandrine Anthoine |
Abstract | In this paper, we study the properties of the Frank-Wolfe algorithm to solve the \ExactSparse reconstruction problem. We prove that when the dictionary is quasi-incoherent, at each iteration, the Frank-Wolfe algorithm picks up an atom indexed by the support. We also prove that when the dictionary is quasi-incoherent, there exists an iteration beyond which the algorithm converges exponentially fast. |
Tasks | |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07201v1 |
http://arxiv.org/pdf/1812.07201v1.pdf | |
PWC | https://paperswithcode.com/paper/frank-wolfe-algorithm-for-the-exact-sparse |
Repo | |
Framework | |
Deep Learning-Based Decoding for Constrained Sequence Codes
Title | Deep Learning-Based Decoding for Constrained Sequence Codes |
Authors | Congzhe Cao, Duanshun Li, Ivan Fair |
Abstract | Constrained sequence codes have been widely used in modern communication and data storage systems. Sequences encoded with constrained sequence codes satisfy constraints imposed by the physical channel, hence enabling efficient and reliable transmission of coded symbols. Traditional encoding and decoding of constrained sequence codes rely on table look-up, which is prone to errors that occur during transmission. In this paper, we introduce constrained sequence decoding based on deep learning. With multiple layer perception (MLP) networks and convolutional neural networks (CNNs), we are able to achieve low bit error rates that are close to maximum a posteriori probability (MAP) decoding as well as improve the system throughput. Moreover, implementation of capacity-achieving fixed-length codes, where the complexity is prohibitively high with table look-up decoding, becomes practical with deep learning-based decoding. |
Tasks | |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.01859v1 |
http://arxiv.org/pdf/1809.01859v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-decoding-for-constrained |
Repo | |
Framework | |