Paper Group ANR 667
Crime Event Embedding with Unsupervised Feature Selection. A Multi-channel DART Algorithm. N-fold Superposition: Improving Neural Networks by Reducing the Noise in Feature Maps. Visualizing Neural Network Developing Perturbation Theory. Sharp Attention Network via Adaptive Sampling for Person Re-identification. Tracking all members of a honey bee c …
Crime Event Embedding with Unsupervised Feature Selection
Title | Crime Event Embedding with Unsupervised Feature Selection |
Authors | Shixiang Zhu, Yao Xie |
Abstract | We present a novel event embedding algorithm for crime data that can jointly capture time, location, and the complex free-text component of each event. The embedding is achieved by regularized Restricted Boltzmann Machines (RBMs), and we introduce a new way to regularize by imposing a $\ell_1$ penalty on the conditional distributions of the observed variables of RBMs. This choice of regularization performs feature selection and it also leads to efficient computation since the gradient can be computed in a closed form. The feature selection forces embedding to be based on the most important keywords, which captures the common modus operandi (M. O.) in crime series. Using numerical experiments on a large-scale crime dataset, we show that our regularized RBMs can achieve better event embedding and the selected features are highly interpretable from human understanding. |
Tasks | Feature Selection |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.06095v4 |
http://arxiv.org/pdf/1806.06095v4.pdf | |
PWC | https://paperswithcode.com/paper/crime-event-embedding-with-unsupervised |
Repo | |
Framework | |
A Multi-channel DART Algorithm
Title | A Multi-channel DART Algorithm |
Authors | Mathé Zeegers, Felix Lucka, Kees Joost Batenburg |
Abstract | Tomography deals with the reconstruction of objects from their projections, acquired along a range of angles. Discrete tomography is concerned with objects that consist of a small number of materials, which makes it possible to compute accurate reconstructions from highly limited projection data. For cases where the allowed intensity values in the reconstruction are known a priori, the discrete algebraic reconstruction technique (DART) has shown to yield accurate reconstructions from few projections. However, a key limitation is that the benefit of DART diminishes as the number of different materials increases. Many tomographic imaging techniques can simultaneously record tomographic data at multiple channels, each corresponding to a different weighting of the materials in the object. Whenever projection data from more than one channel is available, this additional information can potentially be exploited by the reconstruction algorithm. In this paper we present Multi-Channel DART (MC-DART), which deals effectively with multi-channel data. This class of algorithms is a generalization of DART to multiple channels and combines the information for each separate channel-reconstruction in a multi-channel segmentation step. We demonstrate that in a range of simulation experiments, MC-DART is capable of producing more accurate reconstructions compared to single-channel DART. |
Tasks | |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09170v1 |
http://arxiv.org/pdf/1808.09170v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-channel-dart-algorithm |
Repo | |
Framework | |
N-fold Superposition: Improving Neural Networks by Reducing the Noise in Feature Maps
Title | N-fold Superposition: Improving Neural Networks by Reducing the Noise in Feature Maps |
Authors | Yang Liu, Qiang Qu, Chao Gao |
Abstract | Considering the use of Fully Connected (FC) layer limits the performance of Convolutional Neural Networks (CNNs), this paper develops a method to improve the coupling between the convolution layer and the FC layer by reducing the noise in Feature Maps (FMs). Our approach is divided into three steps. Firstly, we separate all the FMs into n blocks equally. Then, the weighted summation of FMs at the same position in all blocks constitutes a new block of FMs. Finally, we replicate this new block into n copies and concatenate them as the input to the FC layer. This sharing of FMs could reduce the noise in them apparently and avert the impact by a particular FM on the specific part weight of hidden layers, hence preventing the network from overfitting to some extent. Using the Fermat Lemma, we prove that this method could make the global minima value range of the loss function wider, by which makes it easier for neural networks to converge and accelerates the convergence process. This method does not significantly increase the amounts of network parameters (only a few more coefficients added), and the experiments demonstrate that this method could increase the convergence speed and improve the classification performance of neural networks. |
Tasks | |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08233v3 |
http://arxiv.org/pdf/1804.08233v3.pdf | |
PWC | https://paperswithcode.com/paper/n-fold-superposition-improving-neural |
Repo | |
Framework | |
Visualizing Neural Network Developing Perturbation Theory
Title | Visualizing Neural Network Developing Perturbation Theory |
Authors | Yadong Wu, Pengfei Zhang, Huitao Shen, Hui Zhai |
Abstract | In this letter, motivated by the question that whether the empirical fitting of data by neural network can yield the same structure of physical laws, we apply the neural network to a simple quantum mechanical two-body scattering problem with short-range potentials, which by itself also plays an important role in many branches of physics. We train a neural network to accurately predict $ s $-wave scattering length, which governs the low-energy scattering physics, directly from the scattering potential without solving Schr"odinger equation or obtaining the wavefunction. After analyzing the neural network, it is shown that the neural network develops perturbation theory order by order when the potential increases. This provides an important benchmark to the machine-assisted physics research or even automated machine learning physics laws. |
Tasks | |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.03930v2 |
http://arxiv.org/pdf/1802.03930v2.pdf | |
PWC | https://paperswithcode.com/paper/visualizing-neural-network-developing |
Repo | |
Framework | |
Sharp Attention Network via Adaptive Sampling for Person Re-identification
Title | Sharp Attention Network via Adaptive Sampling for Person Re-identification |
Authors | Chen Shen, Guo-Jun Qi, Rongxin Jiang, Zhongming Jin, Hongwei Yong, Yaowu Chen, Xian-Sheng Hua |
Abstract | In this paper, we present novel sharp attention networks by adaptively sampling feature maps from convolutional neural networks (CNNs) for person re-identification (re-ID) problem. Due to the introduction of sampling-based attention models, the proposed approach can adaptively generate sharper attention-aware feature masks. This greatly differs from the gating-based attention mechanism that relies soft gating functions to select the relevant features for person re-ID. In contrast, the proposed sampling-based attention mechanism allows us to effectively trim irrelevant features by enforcing the resultant feature masks to focus on the most discriminative features. It can produce sharper attentions that are more assertive in localizing subtle features relevant to re-identifying people across cameras. For this purpose, a differentiable Gumbel-Softmax sampler is employed to approximate the Bernoulli sampling to train the sharp attention networks. Extensive experimental evaluations demonstrate the superiority of this new sharp attention model for person re-ID over the other state-of-the-art methods on three challenging benchmarks including CUHK03, Market-1501, and DukeMTMC-reID. |
Tasks | Person Re-Identification |
Published | 2018-05-07 |
URL | http://arxiv.org/abs/1805.02336v2 |
http://arxiv.org/pdf/1805.02336v2.pdf | |
PWC | https://paperswithcode.com/paper/sharp-attention-network-via-adaptive-sampling |
Repo | |
Framework | |
Tracking all members of a honey bee colony over their lifetime
Title | Tracking all members of a honey bee colony over their lifetime |
Authors | Franziska Boenisch, Benjamin Rosemann, Benjamin Wild, Fernando Wario, David Dormagen, Tim Landgraf |
Abstract | Computational approaches to the analysis of collective behavior in social insects increasingly rely on motion paths as an intermediate data layer from which one can infer individual behaviors or social interactions. Honey bees are a popular model for learning and memory. Previous experience has been shown to affect and modulate future social interactions. So far, no lifetime history observations have been reported for all bees of a colony. In a previous work we introduced a tracking system customized to track up to $4000$ bees over several weeks. In this contribution we present an in-depth description of the underlying multi-step algorithm which both produces the motion paths, and also improves the marker decoding accuracy significantly. We automatically tracked ${\sim}2000$ marked honey bees over 10 weeks with inexpensive recording hardware using markers without any error correction bits. We found that the proposed two-step tracking reduced incorrect ID decodings from initially ${\sim}13%$ to around $2%$ post-tracking. Alongside this paper, we publish the first trajectory dataset for all bees in a colony, extracted from ${\sim} 4$ million images. We invite researchers to join the collective scientific effort to investigate this intriguing animal system. All components of our system are open-source. |
Tasks | |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03192v2 |
http://arxiv.org/pdf/1802.03192v2.pdf | |
PWC | https://paperswithcode.com/paper/tracking-all-members-of-a-honey-bee-colony |
Repo | |
Framework | |
Hybrid Approach to Automation, RPA and Machine Learning: a Method for the Human-centered Design of Software Robots
Title | Hybrid Approach to Automation, RPA and Machine Learning: a Method for the Human-centered Design of Software Robots |
Authors | Wiesław Kopeć, Marcin Skibiński, Cezary Biele, Kinga Skorupska, Dominika Tkaczyk, Anna Jaskulska, Katarzyna Abramczuk, Piotr Gago, Krzysztof Marasek |
Abstract | One of the more prominent trends within Industry 4.0 is the drive to employ Robotic Process Automation (RPA), especially as one of the elements of the Lean approach. The full implementation of RPA is riddled with challenges relating both to the reality of everyday business operations, from SMEs to SSCs and beyond, and the social effects of the changing job market. To successfully address these points there is a need to develop a solution that would adjust to the existing business operations and at the same time lower the negative social impact of the automation process. To achieve these goals we propose a hybrid, human-centered approach to the development of software robots. This design and implementation method combines the Living Lab approach with empowerment through participatory design to kick-start the co-development and co-maintenance of hybrid software robots which, supported by variety of AI methods and tools, including interactive and collaborative ML in the cloud, transform menial job posts into higher-skilled positions, allowing former employees to stay on as robot co-designers and maintainers, i.e. as co-programmers who supervise the machine learning processes with the use of tailored high-level RPA Domain Specific Languages (DSLs) to adjust the functioning of the robots and maintain operational flexibility. |
Tasks | |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02213v1 |
http://arxiv.org/pdf/1811.02213v1.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-approach-to-automation-rpa-and-machine |
Repo | |
Framework | |
General Value Function Networks
Title | General Value Function Networks |
Authors | Matthew Schlegel, Adam White, Andrew Patterson, Martha White |
Abstract | In this paper we show that restricting the representation-layer of a Recurrent Neural Network (RNN) improves accuracy and reduces the depth of recursive training procedures in partially observable domains. Artificial Neural Networks have been shown to learn useful state representations for high-dimensional visual and continuous control domains. If the the tasks at hand exhibits long depends back in time, these instantaneous feed-forward approaches are augmented with recurrent connections and trained with Back-prop Through Time (BPTT). This unrolled training can become computationally prohibitive if the dependency structure is long, and while recent work on LSTMs and GRUs has improved upon naive training strategies, there is still room for improvements in computational efficiency and parameter sensitivity. In this paper we explore a simple modification to the classic RNN structure: restricting the state to be comprised of multi-step General Value Function predictions. We formulate an architecture called General Value Function Networks (GVFNs), and corresponding objective that generalizes beyond previous approaches. We show that our GVFNs are significantly more robust to train, and facilitate accurate prediction with no gradients needed back-in-time in domains with substantial long-term dependences. |
Tasks | Continuous Control |
Published | 2018-07-18 |
URL | https://arxiv.org/abs/1807.06763v2 |
https://arxiv.org/pdf/1807.06763v2.pdf | |
PWC | https://paperswithcode.com/paper/general-value-function-networks |
Repo | |
Framework | |
A Re-ranker Scheme for Integrating Large Scale NLU models
Title | A Re-ranker Scheme for Integrating Large Scale NLU models |
Authors | Chengwei Su, Rahul Gupta, Shankar Ananthakrishnan, Spyros Matsoukas |
Abstract | Large scale Natural Language Understanding (NLU) systems are typically trained on large quantities of data, requiring a fast and scalable training strategy. A typical design for NLU systems consists of domain-level NLU modules (domain classifier, intent classifier and named entity recognizer). Hypotheses (NLU interpretations consisting of various intent+slot combinations) from these domain specific modules are typically aggregated with another downstream component. The re-ranker integrates outputs from domain-level recognizers, returning a scored list of cross domain hypotheses. An ideal re-ranker will exhibit the following two properties: (a) it should prefer the most relevant hypothesis for the given input as the top hypothesis and, (b) the interpretation scores corresponding to each hypothesis produced by the re-ranker should be calibrated. Calibration allows the final NLU interpretation score to be comparable across domains. We propose a novel re-ranker strategy that addresses these aspects, while also maintaining domain specific modularity. We design optimization loss functions for such a modularized re-ranker and present results on decreasing the top hypothesis error rate as well as maintaining the model calibration. We also experiment with an extension involving training the domain specific re-rankers on datasets curated independently by each domain to allow further asynchronization. %The proposed re-ranker design showcases the following: (i) improved NLU performance over an unweighted aggregation strategy, (ii) cross-domain calibrated performance and, (iii) support for use cases involving training each re-ranker on datasets curated by each domain independently. |
Tasks | Calibration |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09605v1 |
http://arxiv.org/pdf/1809.09605v1.pdf | |
PWC | https://paperswithcode.com/paper/a-re-ranker-scheme-for-integrating-large |
Repo | |
Framework | |
Towards Functorial Language-Games
Title | Towards Functorial Language-Games |
Authors | Jules Hedges, Martha Lewis |
Abstract | In categorical compositional semantics of natural language one studies functors from a category of grammatical derivations (such as a Lambek pregroup) to a semantic category (such as real vector spaces). We compositionally build game-theoretic semantics of sentences by taking the semantic category to be the category whose morphisms are open games. This requires some modifications to the grammar category to compensate for the failure of open games to form a compact closed category. We illustrate the theory using simple examples of Wittgenstein’s language-games. |
Tasks | |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07828v2 |
http://arxiv.org/pdf/1807.07828v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-functorial-language-games |
Repo | |
Framework | |
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Title | Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion |
Authors | Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee |
Abstract | Integrating model-free and model-based approaches in reinforcement learning has the potential to achieve the high performance of model-free algorithms with low sample complexity. However, this is difficult because an imperfect dynamics model can degrade the performance of the learning algorithm, and in sufficiently complex environments, the dynamics model will almost always be imperfect. As a result, a key challenge is to combine model-based approaches with model-free learning in such a way that errors in the model do not degrade performance. We propose stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue. By dynamically interpolating between model rollouts of various horizon lengths for each individual example, STEVE ensures that the model is only utilized when doing so does not introduce significant errors. Our approach outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency, and in contrast to previous model-based approaches, performance does not degrade in complex environments. |
Tasks | Continuous Control |
Published | 2018-07-04 |
URL | https://arxiv.org/abs/1807.01675v2 |
https://arxiv.org/pdf/1807.01675v2.pdf | |
PWC | https://paperswithcode.com/paper/sample-efficient-reinforcement-learning-with |
Repo | |
Framework | |
Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval
Title | Entity-Duet Neural Ranking: Understanding the Role of Knowledge Graph Semantics in Neural Information Retrieval |
Authors | Zhenghao Liu, Chenyan Xiong, Maosong Sun, Zhiyuan Liu |
Abstract | This paper presents the Entity-Duet Neural Ranking Model (EDRM), which introduces knowledge graphs to neural search systems. EDRM represents queries and documents by their words and entity annotations. The semantics from knowledge graphs are integrated in the distributed representations of their entities, while the ranking is conducted by interaction-based neural ranking networks. The two components are learned end-to-end, making EDRM a natural combination of entity-oriented search and neural information retrieval. Our experiments on a commercial search log demonstrate the effectiveness of EDRM. Our analyses reveal that knowledge graph semantics significantly improve the generalization ability of neural ranking models. |
Tasks | Information Retrieval, Knowledge Graphs |
Published | 2018-05-19 |
URL | https://arxiv.org/abs/1805.07591v2 |
https://arxiv.org/pdf/1805.07591v2.pdf | |
PWC | https://paperswithcode.com/paper/entity-duet-neural-ranking-understanding-the-1 |
Repo | |
Framework | |
Scalable GAM using sparse variational Gaussian processes
Title | Scalable GAM using sparse variational Gaussian processes |
Authors | Vincent Adam, Nicolas Durrande, ST John |
Abstract | Generalized additive models (GAMs) are a widely used class of models of interest to statisticians as they provide a flexible way to design interpretable models of data beyond linear models. We here propose a scalable and well-calibrated Bayesian treatment of GAMs using Gaussian processes (GPs) and leveraging recent advances in variational inference. We use sparse GPs to represent each component and exploit the additive structure of the model to efficiently represent a Gaussian a posteriori coupling between the components. |
Tasks | Gaussian Processes |
Published | 2018-12-28 |
URL | http://arxiv.org/abs/1812.11106v1 |
http://arxiv.org/pdf/1812.11106v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-gam-using-sparse-variational |
Repo | |
Framework | |
DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing
Title | DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing |
Authors | Alberto Delmas, Sayeh Sharify, Patrick Judd, Kevin Siu, Milos Nikolic, Andreas Moshovos |
Abstract | We show that selecting a single data type (precision) for all values in Deep Neural Networks, even if that data type is different per layer, amounts to worst case design. Much shorter data types can be used if we target the common case by adjusting the precision at a much finer granularity. We propose Dynamic Precision Reduction (DPRed), where we group weights and activations and encode them using a precision specific to each group. The per group precisions are selected statically for the weights and dynamically by hardware for the activations. We exploit these precisions to reduce: 1) off-chip storage and off- and on-chip communication, and 2) execution time. DPRed compression reduces off-chip traffic to nearly 35% and 33% on average compared to no compression respectively for 16b and 8b models. This makes it possible to sustain higher performance for a given off-chip memory interface while also boosting energy efficiency. We also demonstrate designs where the time required to process each group of activations and/or weights scales proportionally to the precision they use for convolutional and fully-connected layers. This improves execution time and energy efficiency for both dense and sparse networks. We show the techniques work with 8-bit networks, where 1.82x and 2.81x speedups are achieved for two different hardware variants that take advantage of dynamic precision variability. |
Tasks | |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06732v3 |
http://arxiv.org/pdf/1804.06732v3.pdf | |
PWC | https://paperswithcode.com/paper/dpred-making-typical-activation-and-weight |
Repo | |
Framework | |
Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences
Title | Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences |
Authors | Mohammed E. Fathy, Quoc-Huy Tran, M. Zeeshan Zia, Paul Vernaza, Manmohan Chandraker |
Abstract | Interest point descriptors have fueled progress on almost every problem in computer vision. Recent advances in deep neural networks have enabled task-specific learned descriptors that outperform hand-crafted descriptors on many problems. We demonstrate that commonly used metric learning approaches do not optimally leverage the feature hierarchies learned in a Convolutional Neural Network (CNN), especially when applied to the task of geometric feature matching. While a metric loss applied to the deepest layer of a CNN, is often expected to yield ideal features irrespective of the task, in fact the growing receptive field as well as striding effects cause shallower features to be better at high precision matching tasks. We leverage this insight together with explicit supervision at multiple levels of the feature hierarchy for better regularization, to learn more effective descriptors in the context of geometric matching tasks. Further, we propose to use activation maps at different layers of a CNN, as an effective and principled replacement for the multi-resolution image pyramids often used for matching tasks. We propose concrete CNN architectures employing these ideas, and evaluate them on multiple datasets for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across datasets. |
Tasks | Metric Learning, Optical Flow Estimation |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07231v3 |
http://arxiv.org/pdf/1803.07231v3.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-metric-learning-and-matching-for |
Repo | |
Framework | |