Paper Group ANR 210
An Overview of In-memory Processing with Emerging Non-volatile Memory for Data-intensive Applications. Online Pricing with Reserve Price Constraint for Personal Data Markets. Disease gene prioritization using network topological analysis from a sequence based human functional linkage network. Deep Reinforcement Learning for Industrial Insertion Tas …
An Overview of In-memory Processing with Emerging Non-volatile Memory for Data-intensive Applications
Title | An Overview of In-memory Processing with Emerging Non-volatile Memory for Data-intensive Applications |
Authors | Bing Li, Bonan Yan, Hai, Li |
Abstract | The conventional von Neumann architecture has been revealed as a major performance and energy bottleneck for rising data-intensive applications. %, due to the intensive data movements. The decade-old idea of leveraging in-memory processing to eliminate substantial data movements has returned and led extensive research activities. The effectiveness of in-memory processing heavily relies on memory scalability, which cannot be satisfied by traditional memory technologies. Emerging non-volatile memories (eNVMs) that pose appealing qualities such as excellent scaling and low energy consumption, on the other hand, have been heavily investigated and explored for realizing in-memory processing architecture. In this paper, we summarize the recent research progress in eNVM-based in-memory processing from various aspects, including the adopted memory technologies, locations of the in-memory processing in the system, supported arithmetics, as well as applied applications. |
Tasks | |
Published | 2019-06-15 |
URL | https://arxiv.org/abs/1906.06603v1 |
https://arxiv.org/pdf/1906.06603v1.pdf | |
PWC | https://paperswithcode.com/paper/an-overview-of-in-memory-processing-with |
Repo | |
Framework | |
Online Pricing with Reserve Price Constraint for Personal Data Markets
Title | Online Pricing with Reserve Price Constraint for Personal Data Markets |
Authors | Chaoyue Niu, Zhenzhe Zheng, Fan Wu, Shaojie Tang, Guihai Chen |
Abstract | The society’s insatiable appetites for personal data are driving the emergency of data markets, allowing data consumers to launch customized queries over the datasets collected by a data broker from data owners. In this paper, we study how the data broker can maximize her cumulative revenue by posting reasonable prices for sequential queries. We thus propose a contextual dynamic pricing mechanism with the reserve price constraint, which features the properties of ellipsoid for efficient online optimization, and can support linear and non-linear market value models with uncertainty. In particular, under low uncertainty, our pricing mechanism provides a worst-case regret logarithmic in the number of queries. We further extend to other similar application scenarios, including hospitality service, online advertising, and loan application, and extensively evaluate three pricing instances of noisy linear query, accommodation rental, and impression over MovieLens 20M dataset, Airbnb listings in U.S. major cities, and Avazu mobile ad click dataset, respectively. The analysis and evaluation results reveal that our proposed pricing mechanism incurs low practical regret, online latency, and memory overhead, and also demonstrate that the existence of reserve price can mitigate the cold-start problem in a posted price mechanism, and thus can reduce the cumulative regret. |
Tasks | |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12598v1 |
https://arxiv.org/pdf/1911.12598v1.pdf | |
PWC | https://paperswithcode.com/paper/online-pricing-with-reserve-price-constraint |
Repo | |
Framework | |
Disease gene prioritization using network topological analysis from a sequence based human functional linkage network
Title | Disease gene prioritization using network topological analysis from a sequence based human functional linkage network |
Authors | Ali Jalilvand, Behzad Akbari, Fatemeh Zare Mirakabad, Foad Ghaderi |
Abstract | Sequencing large number of candidate disease genes which cause diseases in order to identify the relationship between them is an expensive and time-consuming task. To handle these challenges, different computational approaches have been developed. Based on the observation that genes associated with similar diseases have a higher likelihood of interaction, a large class of these approaches relay on analyzing the topological properties of biological networks. However, the incomplete and noisy nature of biological networks is known as an important challenge in these approaches. In this paper, we propose a two-step framework for disease gene prioritization: (1) construction of a reliable human FLN using sequence information and machine learning techniques, (2) prioritizing the disease gene relations based on the constructed FLN. On our framework, unlike other FLN based frameworks that using FLNs based on integration of various low quality biological data, the sequence of proteins is used as the comprehensive data to construct a reliable initial network. In addition, the physicochemical properties of amino-acids are employed to describe the functionality of proteins. All in all, the proposed approach is evaluated and the results indicate the high efficiency and validity of the FLN in disease gene prioritization. |
Tasks | |
Published | 2019-04-15 |
URL | http://arxiv.org/abs/1904.06973v1 |
http://arxiv.org/pdf/1904.06973v1.pdf | |
PWC | https://paperswithcode.com/paper/disease-gene-prioritization-using-network |
Repo | |
Framework | |
Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards
Title | Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards |
Authors | Gerrit Schoettler, Ashvin Nair, Jianlan Luo, Shikhar Bahl, Juan Aparicio Ojea, Eugen Solowjow, Sergey Levine |
Abstract | Connector insertion and many other tasks commonly found in modern manufacturing settings involve complex contact dynamics and friction. Since it is difficult to capture related physical effects with first-order modeling, traditional control methods often result in brittle and inaccurate controllers, which have to be manually tuned. Reinforcement learning (RL) methods have been demonstrated to be capable of learning controllers in such environments from autonomous interaction with the environment, but running RL algorithms in the real world poses sample efficiency and safety challenges. Moreover, in practical real-world settings we cannot assume access to perfect state information or dense reward signals. In this paper, we consider a variety of difficult industrial insertion tasks with visual inputs and different natural reward specifications, namely sparse rewards and goal images. We show that methods that combine RL with prior information, such as classical controllers or demonstrations, can solve these tasks from a reasonable amount of real-world interaction. |
Tasks | |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05841v2 |
https://arxiv.org/pdf/1906.05841v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-industrial |
Repo | |
Framework | |
Case-Based Reasoning for Assisting Domain Experts in Processing Fraud Alerts of Black-Box Machine Learning Models
Title | Case-Based Reasoning for Assisting Domain Experts in Processing Fraud Alerts of Black-Box Machine Learning Models |
Authors | Hilde J. P. Weerts, Werner van Ipenburg, Mykola Pechenizkiy |
Abstract | In many contexts, it can be useful for domain experts to understand to what extent predictions made by a machine learning model can be trusted. In particular, estimates of trustworthiness can be useful for fraud analysts who process machine learning-generated alerts of fraudulent transactions. In this work, we present a case-based reasoning (CBR) approach that provides evidence on the trustworthiness of a prediction in the form of a visualization of similar previous instances. Different from previous works, we consider similarity of local post-hoc explanations of predictions and show empirically that our visualization can be useful for processing alerts. Furthermore, our approach is perceived useful and easy to use by fraud analysts at a major Dutch bank. |
Tasks | |
Published | 2019-07-07 |
URL | https://arxiv.org/abs/1907.03334v1 |
https://arxiv.org/pdf/1907.03334v1.pdf | |
PWC | https://paperswithcode.com/paper/case-based-reasoning-for-assisting-domain |
Repo | |
Framework | |
The Role of Coded Side Information in Single-Server Private Information Retrieval
Title | The Role of Coded Side Information in Single-Server Private Information Retrieval |
Authors | Anoosheh Heidarzadeh, Fatemeh Kazemi, Alex Sprintson |
Abstract | We study the role of coded side information in single-server Private Information Retrieval (PIR). An instance of the single-server PIR problem includes a server that stores a database of $K$ independently and uniformly distributed messages, and a user who wants to retrieve one of these messages from the server. We consider settings in which the user initially has access to a coded side information which includes a linear combination of a subset of $M$ messages in the database. We assume that the identities of the $M$ messages that form the support set of the coded side information as well as the coding coefficients are initially unknown to the server. We consider two different models, depending on whether the support set of the coded side information includes the requested message or not. We also consider the following two privacy requirements: (i) the identities of both the demand and the support set of the coded side information need to be protected, or (ii) only the identity of the demand needs to be protected. For each model and for each of the privacy requirements, we consider the problem of designing a protocol for generating the user’s query and the server’s answer that enables the user to decode the message they need while satisfying the privacy requirement. We characterize the (scalar-linear) capacity of each setting, defined as the ratio of the number of information bits in a message to the minimum number of information bits downloaded from the server over all (scalar-linear) protocols that satisfy the privacy condition. Our converse proofs rely on new information-theoretic arguments—tailored to the setting of single-server PIR and different from the commonly-used techniques in multi-server PIR settings. We also present novel capacity-achieving scalar-linear protocols for each of the settings being considered. |
Tasks | Information Retrieval |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07612v1 |
https://arxiv.org/pdf/1910.07612v1.pdf | |
PWC | https://paperswithcode.com/paper/the-role-of-coded-side-information-in-single |
Repo | |
Framework | |
A Large Parallel Corpus of Full-Text Scientific Articles
Title | A Large Parallel Corpus of Full-Text Scientific Articles |
Authors | Felipe Soares, Viviane Pereira Moreira, Karin Becker |
Abstract | The Scielo database is an important source of scientific information in Latin America, containing articles from several research domains. A striking characteristic of Scielo is that many of its full-text contents are presented in more than one language, thus being a potential source of parallel corpora. In this article, we present the development of a parallel corpus from Scielo in three languages: English, Portuguese, and Spanish. Sentences were automatically aligned using the Hunalign algorithm for all language pairs, and for a subset of trilingual articles also. We demonstrate the capabilities of our corpus by training a Statistical Machine Translation system (Moses) for each language pair, which outperformed related works on scientific articles. Sentence alignment was also manually evaluated, presenting an average of 98.8% correctly aligned sentences across all languages. Our parallel corpus is freely available in the TMX format, with complementary information regarding article metadata. |
Tasks | Machine Translation |
Published | 2019-05-06 |
URL | https://arxiv.org/abs/1905.01852v1 |
https://arxiv.org/pdf/1905.01852v1.pdf | |
PWC | https://paperswithcode.com/paper/a-large-parallel-corpus-of-full-text-1 |
Repo | |
Framework | |
What does it mean to solve the problem of discrimination in hiring? Social, technical and legal perspectives from the UK on automated hiring systems
Title | What does it mean to solve the problem of discrimination in hiring? Social, technical and legal perspectives from the UK on automated hiring systems |
Authors | Javier Sanchez-Monedero, Lina Dencik, Lilian Edwards |
Abstract | The ability to get and keep a job is a key aspect of participating in society and sustaining livelihoods. Yet the way decisions are made on who is eligible for jobs, and why, are rapidly changing with the advent and growth in uptake of automated hiring systems (AHSs) powered by data-driven tools. Key concerns about such AHSs include the lack of transparency and potential limitation of access to jobs for specific profiles. In relation to the latter, however, several of these AHSs claim to detect and mitigate discriminatory practices against protected groups and promote diversity and inclusion at work. Yet whilst these tools have a growing user-base around the world, such claims of bias mitigation are rarely scrutinised and evaluated, and when done so, have almost exclusively been from a US socio-legal perspective. In this paper, we introduce a perspective outside the US by critically examining how three prominent automated hiring systems (AHSs) in regular use in the UK, HireVue, Pymetrics and Applied, understand and attempt to mitigate bias and discrimination. Using publicly available documents, we describe how their tools are designed, validated and audited for bias, highlighting assumptions and limitations, before situating these in the socio-legal context of the UK. The UK has a very different legal background to the US in terms not only of hiring and equality law, but also in terms of data protection (DP) law. We argue that this might be important for addressing concerns about transparency and could mean a challenge to building bias mitigation into AHSs definitively capable of meeting EU legal standards. This is significant as these AHSs, especially those developed in the US, may obscure rather than improve systemic discrimination in the workplace. |
Tasks | |
Published | 2019-09-28 |
URL | https://arxiv.org/abs/1910.06144v2 |
https://arxiv.org/pdf/1910.06144v2.pdf | |
PWC | https://paperswithcode.com/paper/what-does-it-mean-to-solve-the-problem-of |
Repo | |
Framework | |
Multi Objective Particle Swarm Optimization based Cooperative Agents with Automated Negotiation
Title | Multi Objective Particle Swarm Optimization based Cooperative Agents with Automated Negotiation |
Authors | Najwa Kouka, Raja Fdhila, Adel M. Alimi |
Abstract | This paper investigates a new hybridization of multi-objective particle swarm optimization (MOPSO) and cooperative agents (MOPSO-CA) to handle the problem of stagnation encounters in MOPSO, which leads solutions to trap in local optima. The proposed approach involves a new distribution strategy based on the idea of having a set of a sub-population, each of which is processed by one agent. The number of the sub-population and agents are adjusted dynamically through the Pareto ranking. This method allocates a dynamic number of sub-population as required to improve diversity in the search space. Additionally, agents are used for better management for the exploitation within a sub-population, and for exploration among sub-populations. Furthermore, we investigate the automated negotiation within agents in order to share the best knowledge. To validate our approach, several benchmarks are performed. The results show that the introduced variant ensures the trade-off between the exploitation and exploration with respect to the comparative algorithms |
Tasks | |
Published | 2019-01-27 |
URL | http://arxiv.org/abs/1901.09292v1 |
http://arxiv.org/pdf/1901.09292v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-objective-particle-swarm-optimization-1 |
Repo | |
Framework | |
This dataset does not exist: training models from generated images
Title | This dataset does not exist: training models from generated images |
Authors | Victor Besnier, Himalaya Jain, Andrei Bursuc, Matthieu Cord, Patrick Pérez |
Abstract | Current generative networks are increasingly proficient in generating high-resolution realistic images. These generative networks, especially the conditional ones, can potentially become a great tool for providing new image datasets. This naturally brings the question: Can we train a classifier only on the generated data? This potential availability of nearly unlimited amounts of training data challenges standard practices for training machine learning models, which have been crafted across the years for limited and fixed size datasets. In this work we investigate this question and its related challenges. We identify ways to improve significantly the performance over naive training on randomly generated images with regular heuristics. We propose three standalone techniques that can be applied at different stages of the pipeline, i.e., data generation, training on generated data, and deploying on real data. We evaluate our proposed approaches on a subset of the ImageNet dataset and show encouraging results compared to classifiers trained on real images. |
Tasks | |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02888v1 |
https://arxiv.org/pdf/1911.02888v1.pdf | |
PWC | https://paperswithcode.com/paper/this-dataset-does-not-exist-training-models |
Repo | |
Framework | |
DimDraw – A novel tool for drawing concept lattices
Title | DimDraw – A novel tool for drawing concept lattices |
Authors | Dominik Dürrschnabel, Tom Hanika, Gerd Stumme |
Abstract | Concept lattice drawings are an important tool to visualize complex relations in data in a simple manner to human readers. Many attempts were made to transfer classical graph drawing approaches to order diagrams. Although those methods are satisfying for some lattices they unfortunately perform poorly in general. In this work we present a novel tool to draw concept lattices that is purely motivated by the order structure. |
Tasks | |
Published | 2019-03-02 |
URL | http://arxiv.org/abs/1903.00686v1 |
http://arxiv.org/pdf/1903.00686v1.pdf | |
PWC | https://paperswithcode.com/paper/dimdraw-a-novel-tool-for-drawing-concept |
Repo | |
Framework | |
On educating machines
Title | On educating machines |
Authors | George Leu, Jiangjun Tang |
Abstract | Machine education is an emerging research field that focuses on the problem which is inverse to machine learning. To date, the literature on educating machines is still in its infancy. A fairly low number of methodology and method papers are scattered throughout various formal and informal publication avenues, mainly because the field is not yet well coalesced (with no well established discussion forums or investigation pathways), but also due to the breadth of its potential ramifications and research directions. In this study we bring together the existing literature and organise the discussion into a small number of research directions (out of many) which are to date sufficiently explored to form a minimal critical mass that can push the machine education concept further towards a standalone research field status. |
Tasks | |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06017v1 |
https://arxiv.org/pdf/1909.06017v1.pdf | |
PWC | https://paperswithcode.com/paper/on-educating-machines |
Repo | |
Framework | |
CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation
Title | CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation |
Authors | Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang |
Abstract | Many real-world applications involve multivariate, geo-tagged time series data: at each location, multiple sensors record corresponding measurements. For example, air quality monitoring system records PM2.5, CO, etc. The resulting time-series data often has missing values due to device outages or communication errors. In order to impute the missing values, state-of-the-art methods are built on Recurrent Neural Networks (RNN), which process each time stamp sequentially, prohibiting the direct modeling of the relationship between distant time stamps. Recently, the self-attention mechanism has been proposed for sequence modeling tasks such as machine translation, significantly outperforming RNN because the relationship between each two time stamps can be modeled explicitly. In this paper, we are the first to adapt the self-attention mechanism for multivariate, geo-tagged time series data. In order to jointly capture the self-attention across multiple dimensions, including time, location and the sensor measurements, while maintain low computational complexity, we propose a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner. Our extensive experiments on four real-world datasets, including three standard benchmarks and our newly collected NYC-traffic dataset, demonstrate that our approach outperforms the state-of-the-art imputation and forecasting methods. A detailed systematic analysis confirms the effectiveness of our design choices. |
Tasks | Imputation, Machine Translation, Time Series |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09904v2 |
https://arxiv.org/pdf/1905.09904v2.pdf | |
PWC | https://paperswithcode.com/paper/cdsa-cross-dimensional-self-attention-for |
Repo | |
Framework | |
Why Machines Cannot Learn Mathematics, Yet
Title | Why Machines Cannot Learn Mathematics, Yet |
Authors | André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky, Bela Gipp |
Abstract | Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods. However, while mathematics is a precise and accurate science, it is usually expressed by less accurate and imprecise descriptions, contributing to the relative dearth of machine learning applications for IR in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in ML, it seems canonical to apply ML techniques to represent and retrieve mathematics semantically. In this work, we apply popular text embedding techniques to the arXiv collection of STEM documents and explore how these are unable to properly understand mathematics from that corpus. In addition, we also investigate the missing aspects that would allow mathematics to be learned by computers. |
Tasks | Information Retrieval |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.08359v1 |
https://arxiv.org/pdf/1905.08359v1.pdf | |
PWC | https://paperswithcode.com/paper/why-machines-cannot-learn-mathematics-yet |
Repo | |
Framework | |
Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains
Title | Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains |
Authors | Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema |
Abstract | In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models. ReStA is a variant of the popular representational similarity analysis (RSA) in cognitive neuroscience. While RSA can be used to compare representations in models, model components, and human brains, ReStA compares instances of the same model, while systematically varying single model parameter. Using ReStA, we study four recent and successful neural language models, and evaluate how sensitive their internal representations are to the amount of prior context. Using RSA, we perform a systematic study of how similar the representational spaces in the first and second (or higher) layers of these models are to each other and to patterns of activation in the human brain. Our results reveal surprisingly strong differences between language models, and give insights into where the deep linguistic processing, that integrates information over multiple sentences, is happening in these models. The combination of ReStA and RSA on models and brains allows us to start addressing the important question of what kind of linguistic processes we can hope to observe in fMRI brain imaging data. In particular, our results suggest that the data on story reading from Wehbe et al. (2014) contains a signal of shallow linguistic processing, but show no evidence on the more interesting deep linguistic processing. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01539v2 |
https://arxiv.org/pdf/1906.01539v2.pdf | |
PWC | https://paperswithcode.com/paper/blackbox-meets-blackbox-representational |
Repo | |
Framework | |