Paper Group ANR 490
Towards Automatic Generation of Entertaining Dialogues in Chinese Crosstalks. Pattern recognition techniques for Boson Sampling validation. Large scale digital prostate pathology image analysis combining feature extraction and deep neural network. Approximate Computational Approaches for Bayesian Sensor Placement in High Dimensions. LIUM-CVC Submis …
Towards Automatic Generation of Entertaining Dialogues in Chinese Crosstalks
Title | Towards Automatic Generation of Entertaining Dialogues in Chinese Crosstalks |
Authors | Shikang Du, Xiaojun Wan, Yajie Ye |
Abstract | Crosstalk, also known by its Chinese name xiangsheng, is a traditional Chinese comedic performing art featuring jokes and funny dialogues, and one of China’s most popular cultural elements. It is typically in the form of a dialogue between two performers for the purpose of bringing laughter to the audience, with one person acting as the leading comedian and the other as the supporting role. Though general dialogue generation has been widely explored in previous studies, it is unknown whether such entertaining dialogues can be automatically generated or not. In this paper, we for the first time investigate the possibility of automatic generation of entertaining dialogues in Chinese crosstalks. Given the utterance of the leading comedian in each dialogue, our task aims to generate the replying utterance of the supporting role. We propose a humor-enhanced translation model to address this task and human evaluation results demonstrate the efficacy of our proposed model. The feasibility of automatic entertaining dialogue generation is also verified. |
Tasks | Dialogue Generation |
Published | 2017-11-01 |
URL | http://arxiv.org/abs/1711.00294v1 |
http://arxiv.org/pdf/1711.00294v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-automatic-generation-of-entertaining |
Repo | |
Framework | |
Pattern recognition techniques for Boson Sampling validation
Title | Pattern recognition techniques for Boson Sampling validation |
Authors | Iris Agresti, Niko Viggianiello, Fulvio Flamini, Nicolò Spagnolo, Andrea Crespi, Roberto Osellame, Nathan Wiebe, Fabio Sciarrino |
Abstract | The difficulty of validating large-scale quantum devices, such as Boson Samplers, poses a major challenge for any research program that aims to show quantum advantages over classical hardware. To address this problem, we propose a novel data-driven approach wherein models are trained to identify common pathologies using unsupervised machine learning methods. We illustrate this idea by training a classifier that exploits K-means clustering to distinguish between Boson Samplers that use indistinguishable photons from those that do not. We train the model on numerical simulations of small-scale Boson Samplers and then validate the pattern recognition technique on larger numerical simulations as well as on photonic chips in both traditional Boson Sampling and scattershot experiments. The effectiveness of such method relies on particle-type-dependent internal correlations present in the output distributions. This approach performs substantially better on the test data than previous methods and underscores the ability to further generalize its operation beyond the scope of the examples that it was trained on. |
Tasks | |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.06863v1 |
http://arxiv.org/pdf/1712.06863v1.pdf | |
PWC | https://paperswithcode.com/paper/pattern-recognition-techniques-for-boson |
Repo | |
Framework | |
Large scale digital prostate pathology image analysis combining feature extraction and deep neural network
Title | Large scale digital prostate pathology image analysis combining feature extraction and deep neural network |
Authors | Naiyun Zhou, Andrey Fedorov, Fiona Fennessy, Ron Kikinis, Yi Gao |
Abstract | Histopathological assessments, including surgical resection and core needle biopsy, are the standard procedures in the diagnosis of the prostate cancer. Current interpretation of the histopathology images includes the determination of the tumor area, Gleason grading, and identification of certain prognosis-critical features. Such a process is not only tedious, but also prune to intra/inter-observe variabilities. Recently, FDA cleared the marketing of the first whole slide imaging system for digital pathology. This opens a new era for the computer aided prostate image analysis and feature extraction based on the digital histopathology images. In this work, we present an analysis pipeline that includes localization of the cancer region, grading, area ratio of different Gleason grades, and cytological/architectural feature extraction. The proposed algorithm combines the human engineered feature extraction as well as those learned by the deep neural network. Moreover, the entire pipeline is implemented to directly operate on the whole slide images produced by the digital scanners and is therefore potentially easy to translate into clinical practices. The algorithm is tested on 368 whole slide images from the TCGA data set and achieves an overall accuracy of 75% in differentiating Gleason 3+4 with 4+3 slides. |
Tasks | |
Published | 2017-05-07 |
URL | http://arxiv.org/abs/1705.02678v2 |
http://arxiv.org/pdf/1705.02678v2.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-digital-prostate-pathology-image |
Repo | |
Framework | |
Approximate Computational Approaches for Bayesian Sensor Placement in High Dimensions
Title | Approximate Computational Approaches for Bayesian Sensor Placement in High Dimensions |
Authors | Xiao Lin, Asif Chowdhury, Xiaofan Wang, Gabriel Terejanu |
Abstract | Since the cost of installing and maintaining sensors is usually high, sensor locations are always strategically selected. For those aiming at inferring certain quantities of interest (QoI), it is desirable to explore the dependency between sensor measurements and QoI. One of the most popular metric for the dependency is mutual information which naturally measures how much information about one variable can be obtained given the other. However, computing mutual information is always challenging, and the result is unreliable in high dimension. In this paper, we propose an approach to find an approximate lower bound of mutual information and compute it in a lower dimension. Then, sensors are placed where highest mutual information (lower bound) is achieved and QoI is inferred via Bayes rule given sensor measurements. In addition, Bayesian optimization is introduced to provide a continuous mutual information surface over the domain and thus reduce the number of evaluations. A chemical release accident is simulated where multiple sensors are placed to locate the source of the release. The result shows that the proposed approach is both effective and efficient in inferring QoI. |
Tasks | |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00368v2 |
http://arxiv.org/pdf/1703.00368v2.pdf | |
PWC | https://paperswithcode.com/paper/170300368 |
Repo | |
Framework | |
LIUM-CVC Submissions for WMT17 Multimodal Translation Task
Title | LIUM-CVC Submissions for WMT17 Multimodal Translation Task |
Authors | Ozan Caglayan, Walid Aransa, Adrien Bardet, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Marc Masana, Luis Herranz, Joost van de Weijer |
Abstract | This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU. |
Tasks | Machine Translation |
Published | 2017-07-14 |
URL | http://arxiv.org/abs/1707.04481v1 |
http://arxiv.org/pdf/1707.04481v1.pdf | |
PWC | https://paperswithcode.com/paper/lium-cvc-submissions-for-wmt17-multimodal |
Repo | |
Framework | |
Stochastic Low-Rank Bandits
Title | Stochastic Low-Rank Bandits |
Authors | Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan |
Abstract | Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobserved. We identify a class of non-negative matrices whose maximum entry can be found statistically efficiently and propose an algorithm for finding them, which we call LowRankElim. We derive a $\DeclareMathOperator{\poly}{poly} O((K + L) \poly(d) \Delta^{-1} \log n)$ upper bound on its $n$-step regret, where $K$ is the number of rows, $L$ is the number of columns, $d$ is the rank of the matrix, and $\Delta$ is the minimum gap. The bound depends on other problem-specific constants that clearly do not depend $K L$. To the best of our knowledge, this is the first such result in the literature. |
Tasks | Recommendation Systems |
Published | 2017-12-13 |
URL | http://arxiv.org/abs/1712.04644v1 |
http://arxiv.org/pdf/1712.04644v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-low-rank-bandits |
Repo | |
Framework | |
Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search
Title | Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search |
Authors | Chaitanya Mitash, Abdeslam Boularias, Kostas E. Bekris |
Abstract | This work proposes a process for efficiently searching over combinations of individual object 6D pose hypotheses in cluttered scenes, especially in cases involving occlusions and objects resting on each other. The initial set of candidate object poses is generated from state-of-the-art object detection and global point cloud registration techniques. The best-scored pose per object by using these techniques may not be accurate due to overlaps and occlusions. Nevertheless, experimental indications provided in this work show that object poses with lower ranks may be closer to the real poses than ones with high ranks according to registration techniques. This motivates a global optimization process for improving these poses by taking into account scene-level physical interactions between objects. It also implies that the Cartesian product of candidate poses for interacting objects must be searched so as to identify the best scene-level hypothesis. To perform the search efficiently, the candidate poses for each object are clustered so as to reduce their number but still keep a sufficient diversity. Then, searching over the combinations of candidate object poses is performed through a Monte Carlo Tree Search (MCTS) process that uses the similarity between the observed depth image of the scene and a rendering of the scene given the hypothesized pose as a score that guides the search procedure. MCTS handles in a principled way the tradeoff between fine-tuning the most promising poses and exploring new ones, by using the Upper Confidence Bound (UCB) technique. Experimental results indicate that this process is able to quickly identify in cluttered scenes physically-consistent object poses that are significantly closer to ground truth compared to poses found by point cloud registration methods. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, Object Detection, Point Cloud Registration, Pose Estimation |
Published | 2017-10-24 |
URL | http://arxiv.org/abs/1710.08577v1 |
http://arxiv.org/pdf/1710.08577v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-6d-pose-estimation-of-objects-in |
Repo | |
Framework | |
Exponential Moving Average Model in Parallel Speech Recognition Training
Title | Exponential Moving Average Model in Parallel Speech Recognition Training |
Authors | Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei |
Abstract | As training data rapid growth, large-scale parallel training with multi-GPUs cluster is widely applied in the neural network model learning currently.We present a new approach that applies exponential moving average method in large-scale parallel training of neural network model. It is a non-interference strategy that the exponential moving average model is not broadcasted to distributed workers to update their local models after model synchronization in the training process, and it is implemented as the final model of the training system. Fully-connected feed-forward neural networks (DNNs) and deep unidirectional Long short-term memory (LSTM) recurrent neural networks (RNNs) are successfully trained with proposed method for large vocabulary continuous speech recognition on Shenma voice search data in Mandarin. The character error rate (CER) of Mandarin speech recognition further degrades than state-of-the-art approaches of parallel training. |
Tasks | Large Vocabulary Continuous Speech Recognition, Speech Recognition |
Published | 2017-03-03 |
URL | http://arxiv.org/abs/1703.01024v1 |
http://arxiv.org/pdf/1703.01024v1.pdf | |
PWC | https://paperswithcode.com/paper/exponential-moving-average-model-in-parallel |
Repo | |
Framework | |
Recovering 6D Object Pose: A Review and Multi-modal Analysis
Title | Recovering 6D Object Pose: A Review and Multi-modal Analysis |
Authors | Caner Sahin, Tae-Kyun Kim |
Abstract | A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining “automation” in robotic manipulation? What next steps should the community take for improving “autonomy” in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances’ 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, Object Detection, Pose Estimation |
Published | 2017-06-10 |
URL | http://arxiv.org/abs/1706.03285v2 |
http://arxiv.org/pdf/1706.03285v2.pdf | |
PWC | https://paperswithcode.com/paper/recovering-6d-object-pose-a-review-and-multi |
Repo | |
Framework | |
Long-Range Correlation Underlying Childhood Language and Generative Models
Title | Long-Range Correlation Underlying Childhood Language and Generative Models |
Authors | Kumiko Tanaka-Ishii |
Abstract | Long-range correlation, a property of time series exhibiting long-term memory, is mainly studied in the statistical physics domain and has been reported to exist in natural language. Using a state-of-the-art method for such analysis, long-range correlation is first shown to occur in long CHILDES data sets. To understand why, Bayesian generative models of language, originally proposed in the cognitive scientific domain, are investigated. Among representative models, the Simon model was found to exhibit surprisingly good long-range correlation, but not the Pitman-Yor model. Since the Simon model is known not to correctly reflect the vocabulary growth of natural language, a simple new model is devised as a conjunct of the Simon and Pitman-Yor models, such that long-range correlation holds with a correct vocabulary growth rate. The investigation overall suggests that uniform sampling is one cause of long-range correlation and could thus have a relation with actual linguistic processes. |
Tasks | Time Series |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.03645v1 |
http://arxiv.org/pdf/1712.03645v1.pdf | |
PWC | https://paperswithcode.com/paper/long-range-correlation-underlying-childhood |
Repo | |
Framework | |
T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects
Title | T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects |
Authors | Tomas Hodan, Pavel Haluza, Stepan Obdrzalek, Jiri Matas, Manolis Lourakis, Xenophon Zabulis |
Abstract | We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less. |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation |
Published | 2017-01-19 |
URL | http://arxiv.org/abs/1701.05498v1 |
http://arxiv.org/pdf/1701.05498v1.pdf | |
PWC | https://paperswithcode.com/paper/t-less-an-rgb-d-dataset-for-6d-pose |
Repo | |
Framework | |
Simultaneous Matrix Diagonalization for Structural Brain Networks Classification
Title | Simultaneous Matrix Diagonalization for Structural Brain Networks Classification |
Authors | Nikita Mokrov, Maxim Panov, Boris A. Gutman, Joshua I. Faskowitz, Neda Jahanshad, Paul M. Thompson |
Abstract | This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenstructures in more stable way. The obtained approximate eigenvalues are further used as features for classification. The proposed approach is demonstrated to be efficient for detection of Alzheimer’s disease, outperforming simple baselines and competing with state-of-the-art approaches to brain disease classification. |
Tasks | |
Published | 2017-10-14 |
URL | http://arxiv.org/abs/1710.05213v1 |
http://arxiv.org/pdf/1710.05213v1.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-matrix-diagonalization-for |
Repo | |
Framework | |
Reverse Curriculum Generation for Reinforcement Learning
Title | Reverse Curriculum Generation for Reinforcement Learning |
Authors | Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel |
Abstract | Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of exploration are required to reach the goal and receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations or by manually designing a task-specific reward shaping function to guide the learning agent. Instead, we propose a method to learn these tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved. The robot is trained in reverse, gradually learning to reach the goal from a set of start states increasingly far from the goal. Our method automatically generates a curriculum of start states that adapts to the agent’s performance, leading to efficient training on goal-oriented tasks. We demonstrate our approach on difficult simulated navigation and fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods. |
Tasks | |
Published | 2017-07-17 |
URL | http://arxiv.org/abs/1707.05300v3 |
http://arxiv.org/pdf/1707.05300v3.pdf | |
PWC | https://paperswithcode.com/paper/reverse-curriculum-generation-for |
Repo | |
Framework | |
Automated Top View Registration of Broadcast Football Videos
Title | Automated Top View Registration of Broadcast Football Videos |
Authors | Rahul Anand Sharma, Bharath Bhat, Vineet Gandhi, C. V. Jawahar |
Abstract | In this paper, we propose a novel method to register football broadcast video frames on the static top view model of the playing surface. The proposed method is fully automatic in contrast to the current state of the art which requires manual initialization of point correspondences between the image and the static model. Automatic registration using existing approaches has been difficult due to the lack of sufficient point correspondences. We investigate an alternate approach exploiting the edge information from the line markings on the field. We formulate the registration problem as a nearest neighbour search over a synthetically generated dictionary of edge map and homography pairs. The synthetic dictionary generation allows us to exhaustively cover a wide variety of camera angles and positions and reduce this problem to a minimal per-frame edge map matching procedure. We show that the per-frame results can be improved in videos using an optimization framework for temporal camera stabilization. We demonstrate the efficacy of our approach by presenting extensive results on a dataset collected from matches of football World Cup 2014. |
Tasks | Bird View Synthesis, Homography Estimation |
Published | 2017-03-04 |
URL | http://arxiv.org/abs/1703.01437v1 |
http://arxiv.org/pdf/1703.01437v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-top-view-registration-of-broadcast |
Repo | |
Framework | |
Boosting the accuracy of multi-spectral image pan-sharpening by learning a deep residual network
Title | Boosting the accuracy of multi-spectral image pan-sharpening by learning a deep residual network |
Authors | Yancong Wei, Qiangqiang Yuan, Huanfeng Shen, Liangpei Zhang |
Abstract | In the field of fusing multi-spectral and panchromatic images (Pan-sharpening), the impressive effectiveness of deep neural networks has been recently employed to overcome the drawbacks of traditional linear models and boost the fusing accuracy. However, to the best of our knowledge, existing research works are mainly based on simple and flat networks with relatively shallow architecture, which severely limited their performances. In this paper, the concept of residual learning has been introduced to form a very deep convolutional neural network to make a full use of the high non-linearity of deep learning models. By both quantitative and visual assessments on a large number of high quality multi-spectral images from various sources, it has been supported that our proposed model is superior to all mainstream algorithms included in the comparison, and achieved the highest spatial-spectral unified accuracy. |
Tasks | |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07556v2 |
http://arxiv.org/pdf/1705.07556v2.pdf | |
PWC | https://paperswithcode.com/paper/boosting-the-accuracy-of-multi-spectral-image |
Repo | |
Framework | |