Tumuluri, ChaitanyaChoudhary, Alok N.Mohan, Chilukuri K.2007-04-042007-04-041996-01http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.tc/96-231https://hdl.handle.net/1813/5565Traditionally, in distributed memory architectures, locality maintenance and load balancing are seen as user level activities involving compiler and runtime system support in software. Such software solutions require an explicit phase of execution, requiring the application to suspend its activities. This paper presents the first (to our knowledge) architecture-level scheme for extracting locality concurrent with the application execution. An artificial neural network coprocessor is used for dynamically monitoring processor reference streams to learn temporally emergent utilities of data elements in ongoing local computations. This facilitates use of kernel-level load balancing schemes thus, easing the user programming burden. The kernel-level scheme migrates data to processor memories evincing higher utilities during load-balancing. The performance of an execution-driven simulation evaluating the proposed coprocessor is presented for three applications. The applications chosen represent the range of load and locality fluxes encounted in parallel programs, with (a) static locality and load characteristics, (b) slowly varying localities for fixed datasetsizes and (c) rapidly fluctuating localities among slowly varying datasetsizes. The performance results indicate the viability and success of the coprocessor in concurrently extracting locality for use in load balancing activities.709304 bytes673356 bytesapplication/pdfapplication/postscripten-UStheory centerLocality-Conscious Load Balancing: Connectionist Architectural Supporttechnical report