ELEANOR ROOSEVELT AR A Thesis Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of MSc. by Zhiling Hu, Ananya Paul, Meera Nanda May 2020 ©c 2020 Zhiling Hu, Ananya Paul, Meera Nanda ALL RIGHTS RESERVED Figure 1: Eleanor Roosevelt AR as designed by Michael Byrne ABSTRACT For this project, we chose to focus on the absence of Eleanor Roosevelt (ER) at Four Freedoms Park (FFP) in New York City, despite her active political partic- ipation during Franklin D. Roosevelt’s presidency. Our goal was to communi- cate this participation using Augmented Reality (AR) and the power of dance. We conducted various forms of research in order to determine the best way to approach this and found that an interactive mobile AR app was the best way to do this so that each user was able to have their own experience. Our experimen- tation with AR also helped assess the limitations of what can be implemented within this time frame. CHAPTER 1 INTRODUCTION Eleanor Roosevelt, the former First Lady of the United States, was an extraordi- nary voice for social justice and peace. Her legacy serves as a conduit to high- light the legacies of other trailblazing women from history. For example, at the invitation of ER, Martha Graham, a world leader in the evolving art form of modern dance, performed at the White House some of her Americana pieces in 1937, which indicates ER’s belief in using dance as a form of diplomacy. This interdisciplinary project aims to remedy the absence of ER on Roosevelt Island using augmented reality. We aim to explore the embodied performance of dance at Roosevelt Island’s Four Freedoms Park that brings to life the charac- ter and works of ER (and potentially contributions of other pioneering women). We hope to collaborate with curators and choreographers in creating engaging content that celebrates Eleanor’s 1948 co-authoring of the UN’s Universal Dec- laration of Human Rights. We created a series of research questions to help us tackle this project, varied between embodying historical values and upgrading the passive visitor experi- ence into an active engagement through the design of open-ended interactions. • How can we bring FFP to life with history, dance and women empower- ment using technology such as Augmented Reality? • Can we use historic buildings that have specific features, to project AR? • How can we create a valuable experience for visitors, both physically and virtually, which prompts an exchange for the work? 1 Figure 2.1: Capture from Dust CHAPTER 2 RELATED WORK For our research we chose to start by looking into AR products and technology that exist in the realms of dance, women empowerment, museums and legal aspects. (see [1] in References for a more detailed research paper). 2.1 Dance There is extensive research in the realm of dance performance, largely sup- ported by emerging technologies such as motion tracking cameras. We found projects such as “Dust” [2] (Figure 2.1) which uses a VR headset to place the au- dience in the immediate presence of the dancer with Kinect depth sensors. We also saw “Make the Line Dance” [3] (Figure 2.2) which uses Kinect to track hu- man skeletons then maps a video layer over a human body with QC, MadMap- per and MaxForLive. 2 Figure 2.2: Capture from Make the Line Dance Figure 2.3: Harriet Tubman on $1 bill from ”Notable Women” 2.2 Women Empowerment Even though women represent half the workforce, there are much fewer immer- sive experiences in this realm. “Notable Women” [4] (Figure 2.3) is an AR app that adds 100 prominent American women to US currency. “The Whole Story” is another project which allows users to add AR statues of women through Cen- tral Park. 2.3 Museums and Heritage AR is a prominent emerging technology used in several museum installations, exploring ways to enliven archival exhibits. The “Skin and Bones” [5] (Figure 2.4) exhibit at the Smithsonian Museum is an AR app that when focused on 3 Figure 2.4: Capture from Skin and Bones bones of prehistoric animals on display, brings them to life. “MoMAR” [6] is another useful open source project that makes paintings at the MoMA interac- tive to users. 2.4 Legal Laws around AR and art are currently very fuzzy. Copyright laws may apply when presenting other artists’ work in alternate locations without explicit per- mission. There are also potential issues with the Visual Artists Rights Act when it comes to augmenting artists’ work. Concepts such as “virtual trespassing” have been introduced recently due to games such as Pokémon Go interfering with people’s personal spaces without permission. CHAPTER 3 [APPROACH — METHOD — IMPLEMENTATION ] We approached this project as if we were building a consumer product, and therefore went through several stages of research and design before we began the technical implementation, as shown below. 4 3.1 Research 3.1.1 Market Research We began this part of our research by exploring existing work in the AR uni- verse. The initial investigation was online, and all relevant articles and projects were compiled into a consolidated research document that can be found at the end of this report. The next step we took was to reach out to people in the AR industry whom we found while conducting the above research, to learn more about their projects in order to better further our own. This research also in- cluded a deep dive into the technical capabilities of augmented reality. We ex- plored the opportunities for development within video see through AR, spatial projection and optical see through AR. 3.1.2 User Research We started with a survey to assess how locals and tourists go about exploring New York City and better understand what types of exhibits draw the biggest crowds. We then conducted contextual inquiries at Four Freedoms Park, the location for the launch of our product. We also met with the staff at FFP to learn about visitor behavior. 5 3.1.3 Location Study In a similar fashion, we conducted observational studies at FFP, the intended venue for our project launch, to see how users interact with the space there. We also met with the park staff, Madeline Grimes, to better understand the space. Through this, we were able to gather extremely useful information regarding peak visitation times, how we were allowed to use and interact with the space, what modifications we were allowed to make, and also received schematics for the park to help us better design our product. 3.1.4 Experiments We discovered from our contextual inquiries that visitors of FFP tend to take a lot of pictures on their phones, which are mostly of the city skyline, directed outwards. For visitors who are from New York City, they come to relax and for leisure. For tourists who are visiting, they primarily visit for a different view of the city. Visitors take photos with their phones as well as stand-alone cameras. From talking to the staff at the park we also learned that visitors mostly visit in the afternoon. There are sometimes events organized by third parties hosted in the evening after hours or during the day. The chart (Appendix 1) gives a distribution of cumulative footfall spread out over the months from 2013-2019 as provided by the FFP Authorities. Talking to the authorities we also learnt that they are welcoming and open to installations that we might need for our project. Once we had settled on our final product idea, the next step was to create ex- 6 periments to test the feasibility. We decided on the following three experiments: • Can we detect a vertical plane and project a rectangle onto it? • Can we detect the United Nations (UN) building from FFP and project a rectangle onto it? • Can we use gesture recognition to trigger a projection? We each went through various tutorials and trials to get results for these, which are detailed in the next section. Experiment 1: Can we detect a vertical plane and project a rectangle onto it? We used a tutorial by Next Reality [7] to first identify a vertical plane. The resulting image of the plane (Figure 3.1) was done primarily in XCode using ARKit. [8] We found that we were able to identify salient vertical planes where the edges are clear, such as in the image above. However, in directing the phone’s camera to a wall where the edges were not clear, it did not work as effectively. This may not prove to be a problem for us because the panels we intend to use at the park do have clearly marked edges. We ended up nicely projecting a demo poster onto the plane (Figure 3.2). What’s interesting to note here as well is that the angle of the plane is correctly identified, and the pro- jected image is also angled to match. Experiment 2: Can we detect the UN building from FFP and project a rect- angle onto it? 7 Figure 3.1: Plane detection Figure 3.2: Poster placed on plane We first tested the exiting ARKit and explored the fact that maximum dis- tance capture is around 1.5 meters. Therefore, we realized we needed to make a more exploratory attempt to do ”world scale” AR. After research, we moved forward with Mapbox library [9], aiming to inform ARKit of global position- ing using location data from the phone. This allows items to be placed within the AR world using real-world coordinates. We checked out the new Mapbox iOS + ARKit library [10] and referred to the beginner’s tutorial. [11] The demo combines Mapbox maps and location services with ARKit in the application. Objects are now anchored to GPS coordinates so that Augmented Reality ex- periences can work outside over long distances instead of just within a limited distance around. Limited to the weather conditions, we only managed to test it indoors. (Figure 3.3) 8 Figure 3.3: Location detection Experiment 3: Can we use gesture recognition to trigger a projection? We followed a tutorial [12] for gesture recognition in ARKit and CoreML. We decided on a set of gestures for our experiment and collected a total of 255 images for each gesture under different lights and background. Then we trained the images in Microsoft’s Custom Vision [13] and achieved 100% precision. We tested out the model to see if a gesture is being recognized and achieved a 100% prediction accuracy for our test (Figure 3.4). We then wanted to build the Unity application to launch in mobile. Due to lack of resources, we could not test with an iPhone and therefore tested with an Android application. Following SJ-Wolf’s tutorial [14], we were able to correctly detect objects with their given dataset. (Figure 3.5). However, we were not able to detect gestures with our TensorFlow model that we trained previously with Custom Vision. We want to try gesture detec- 9 Figure 3.4: Prediction accuracy Figure 3.5: Object recognition tion over the next semester with ARKit once we have the necessary resources, and hope to get gesture recognition work with ARKit. 10 3.2 Implementation 3.2.1 Design We conducted an internal design sprint to finalize our product requirements and functionality. After going through individual brainstorming sessions, the three of us came up with approximately 15 unique ideas each. We then created an affinity diagram to organize our ideas and narrowed it down to our top three. We created thorough pitches of each idea to flesh out the features we could include and were able to then select the best product idea. The goal was to incorporate the spirit of Eleanor Roosevelt and the 19 Ges- tures of Empowerment created by the Martha Graham Dance Company. Keep- ing this in mind, we were able to narrow it down to two final ideas. The first involved using the panels at FFP to each represent playing cards, and the user would have to ’flip’ each one over by tapping them in order to see one of 19 ges- tures of empowerment. They would then have to find the matching card and gesture to unlock more information. The workflow for this is showing in Figure 3.6. The second idea involved spawning multiple pins within FFP for the user to find, each one showing and animation of one of the 19 gestures when tapped on. The workflow for this is shown in Figure 3.7. For each of these we created further high fidelity designs within Figma to better visualize how these would work. After several rounds of testing and experiments, we determined that the scavenger hunt model would be a lot eas- ier to implement given the constraints of the park. These are further described within the limitations section of this paper. 11 Figure 3.6: Card Matching Figure 3.7: Scavenger Hunt 12 While finalizing this design, we played around with different color schemes and UI options. We finally decided on using a color scheme that resembled that of the Smithsonian Museum, since one of the eventual goals of this project was to have it displayed in collaboration with their team. This color scheme involved use of yellows and blues of varying brightness. Since the name of the product is My Day, this color scheme fit our narrative perfectly (Figure 3.8, Figure 3.9). 3.2.2 Technical Implementation Unity Our Augmented Reality application is built in Unity with the added packages of ARCore and scripts written in C#. Our Unity application is built in Version 2018.3.13f. Our main scene has first person camera for AR, point cloud system, Event systems, Plane generator, environmental light and 3D objects of rotating diamonds that are scripted to spawn 3D animated models of gestures created in Maya when touched on it. We are using a scene controller script to map gestures with individual dia- monds, which are rotating in space on it’s own axis in a certain velocity con- trolled by a C# script. The diamonds are rigid bodies that uses raycasting col- lision to identify a diamond and spawn a human model. We are using a plane generator to give the impression of the model on the ground instead of it float- ing in the air. Every model has a Unity Animator that goes from Entry state to Anim state ¯ ¯ 13 Figure 3.8: Home Screen Figure 3.9: Model Screen which has motion attached to it specific to that model and gesture. The models are instantiated from their prefabs in the same location as the diamond in xyz space, scaled to imitate human scale and rotated depending from where you touch the diamond. We dynamically instantiate the pose body and also the animator controller is loaded dynamically and attached to the pose body. 14 Unity has the location component integrated that we have discussed in the later section. Gesture Recognition Our initial idea of interaction was around gesture recognition. Since our appli- cation revolves primarily around dance, it made sense to experiment with ges- ture as the primary interaction with the application. We started our experiment with Microsoft’s Computer Vision API. We took 100 pictures of 3 hand gestures across the Cornell Tech campus in different light and background settings and trained a model to identify each gesture to spawn a model. The model was not able to identify gestures picture the training was with not enough images. We then tried to do object recognition with known objects which could identify with about 50% confidence. Since data generation is a difficult problem, and the confidence is not promising, we did not go ahead with this approach. Then we experimented with media pipe, even though media pipe was able to dynamically identify gestures, it did not have support to integrate with Unity and which is why we did not go ahead with it. We also explored other open source OpenCV package implementation for gesture recognition and even though we were able to count the number of open fingers in a hand (Figure 3.10), we were limited by the number of closed ges- tures that we could identify. We were not able to identify complex gestures with OpenCV. There exists OpenCV-for-unity package that is used to integrate OpenCV with Unity, but due to budget constraints we were not able to buy that library, and writing a library for that purpose was beyond the scope of our 15 Figure 3.10: Finger Count with OpenCV project. Because of the above limitations in accuracy, user interaction, gesture learning growth curve and lack of funding we decided to not go ahead with gesture recognition. Geo-location The project is an outdoor AR application. Users will be prompted to confirm if they arrive at the targeted location. So the first step is to get the device geo- coordinates and check if it’s roughly within the goal. The built-in GPS sensor returns the latitude and longitude of the device (camera). We used Unity3D engine and perceived location service input to determine the user’s real-time location. In the controller script, once the application is open, code will check with user’s device for current service status. With certain calls in the code, Android devices will automatically add location permission to the android manifest. 16 Figure 3.11: Pop-up window User will receive a pop-up window from the device prompting to enable the location service. Once approved, user’s location will be retrieved and used for checking if the user reaches the target location. We also implemented methods for calculating distance in meters between two geo-coordinates. Calling them will offer us a direct metric to get the idea of how close is the user from the target. Noticeably, devices have a limit on GPS accuracy. So we manually set a threshold of 10 meters here. This is to say, if users have reached no more than 10 meters from the target location, the device will pop up a window confirming users have reached FFP (Figure 3.11). 17 Maya Since our team was not familiar with animation and 3D modeling, we con- ducted a lot of research to see which platforms would be the quickest and most beneficial for us to learn over the course of the semester. We found that Maya had a similar learning curve as some of the other freeware, and since it is the industry standard in animation and we were able to obtain a student license, we decided to move forward with it. We began by going through multiple tutorials each to familiarize ourselves with the platform. Once we felt comfortable to begin animation, we found a fitting mesh that represented the female form to begin modeling each of the poses. In order to do this, we had to first ’rig’ the mesh. This means we had to add a skeleton to the female model in order to move each of the joints into the correct pose. This proved to be trickier than we expected. Once the joints had been marked correctly onto the body, we began moving the limbs to mimic each of the 19 poses, keeping in mind that this was a 3D model. This meant we had to keep switching views of the model to ensure that the pose did in fact appear correctly from all sides, and not just from the front where the appearance could be deceiving. From here we began working with the animation tools within the platform. In order to do this, we had first set an initial number of frames for each pose, in our case this was set to 200. Then at frame 0, all of the joints and limbs had to be set to be in a neutral position. from here, each joint was moved individually to their final position within a range of frames to ensure the movements from neutral to pose appeared smoothly. In some cases, especially in poses using kicks, this involved saving the joints movements within multiple frames so that 18 the move looked more realistic and did not appear as if the model had contorted in order to arrive at the final position. Once this was complete, we put together a much longer animation showcasing all the poses along with improvisations in order to present the gestures as a coherent routine. This was created using choreography sent to us by the Martha Graham Dance Company. CHAPTER 4 RESULTS The final result of this project was a working mobile AR application built on an Android phone using AR Core and Unity. The app works by detecting the user’s location, and upon their arrival at FFP, spawns 3D models representing pins across the park. The users can then tap on each pin to bring to life each of the animations of the 19 gestures. The final app design currently resembles that of a sculpture park, allowing the user to then walk through all of the final poses and explore further. Linked is the youtube URL demo of our project result. https://youtu.be/WIza3Qzefac CHAPTER 5 DISCUSSION As we explored opportunities within AR in our initial market research phase, we found that while spatial projection would allow for a multi-user expe- rience, having the equipment on site at FFP was simply not feasible, and would only allow for a brief installation rather than a permanent experience. We were limited in the world of optical see through AR as well because this would re- quire us to have multiple headsets available to users as they went through the 19 exhibit. Overall, our research found that using video see through AR through a mobile device was the most easily accessible experience for all users. While our initial idea involving playing cards on the panels sounded a lot more interactive than the current product, our discussions and research with users showed that this might not be the most effective way to present the ma- terials. We were concerned that the matching process between the cards could cause frustration and fatigue from the users, and cause them to abandon the application. We also were concerned about the capabilities of correctly placing and presenting the playing cards to the users based on the results of our exper- iments. Another reason we chose to move forward with the scavenger hunt model instead was to allow for the same concept to be applied in different parts of the city, or in different cities. In using the playing cards model, we were limiting the scope of the project for future iterations to only take place within FFP or similarly paneled locations. We also found that during the technical implementation phase, debugging proved to be quite difficult, and significantly slowed our progress towards the final product. We also saw that while we had designs in mind, we prioritized having the technical elements working correctly before moving forward with UI implementations. 5.1 Limitations One of the most elusive pieces of AR application is occlusion. In other words, the ability to hide virtual objects behind real things. This concern also applies 20 to our project. Currently, some virtual objects that are behind a real object is not hidden behind that real object. There are mainly two solutions for the creation of augmented reality appli- cations, ARKit and ARCore. Our team is composed of developers used to both platforms. ARKit is only compatible with iPads or iPhones on a base of iOS 11 and higher. ARCore, on the other hand, will run on portable devices on a base of Android 7.0 Nougat and higher. Both AR platforms have similar capabilities. Since the number of Android users is much higher than iOS users, we decided to move forward with ARCore. As a result, the current project is only built to Android devices. 5.2 Future work As part of our future work, we would like to create a much more thorough user interface for the users to have more control over the experience. With more funding we would like to integrate packages like OpenCV with this project and be able to make more use of computer vision packages for Unity. We would also like to work more on the occlusion properties for the placement of these models, since this is one of the more prominent limitations we ran into while building the product. In building out these new UI features, we would like to focus on creating some textual elements as well that would be more direct in showcasing the work of Eleanor Roosevelt. In its current form, this representation is abstract, but users may benefit from more explicit information in relation to the park. 21 Another idea we would like to push forward for this product is to build this product as a framework that can be easily adapted to different parts of New York City, to showcase the work of other choreographers and notable women through out history. This model is flexible in that sense, and if we create the foundation for people to simply import new models and location boundaries, we could use this product to enhance historical experiences around the world. CHAPTER 6 CONCLUSION While we began this project with emphasis on Four Freedoms Park, we pro- gressed towards making this more accessible in different locations. The 19 Ges- tures of Empowerment by the Martha Graham Dance Company were very ef- fective in communicating a much larger need for feminism, and the presence of it within locations such as FFP. Overall, this project taught us valuable skills in augmented reality, 3D modeling and location services across different plat- forms. Our experimentation in particular helped us assess the limitations of all the platforms we were working within. We were also able to practice many new skills with user research and design. BIBLIOGRAPHY [1] Consolidated Market Research for Eleanor Roosevelt AR, https://drive.google.com /open?id=10f50KU8ocMLBtn5ADLUq6mYmh25VN6ofEQmfzEe9StY [2] “Dust.” Dust, www.vrdust.org.uk/. [3] “Make the Line Dance.” Vimeo, 1024, 21 Mar. 2011, www.vimeo.com/21308228. 22 [4] “Notable Women - About.” Google, www.notablewomen.withgoogle.com/about. [5] “Bone Hall.” North American Mammals: Sciurus Niger : Species Infor- mation — S299,Museumo f NaturalHistory, http : //naturalhistory.si.edu/exhibits/bone− hall. [6] “MoMAR.” MoMAR, momar.gallery/exhibitions/werefromtheinternet.html. [7] Punn, Ambuj. “ARKit 101: How to Place 2D Images, Like a Paint- ing or Photo, on a Wall in Augmented Reality.” Next Reality, Next Reality, 24 Oct. 2018, mobile-ar.reality.news/how-to/arkit-101-place-2d-images-like- painting-photo-wall-augmented-reality-0187598/. [8] Apple Inc. “ARKit - Apple Developer.” ARKit - Apple Developer, devel- oper.apple.com/arkit/. [9] “Documentation.” Mapbox, docs.mapbox.com/. [10] Mapbox. “Mapbox/Mapbox-Arkit-Ios.” GitHub, 25 Oct. 2017, github.com/mapbox/mapbox-arkit-ios. [11] “Using Maps and Location Services with ARKit.” Points of Interest, Points of Interest, 9 Aug. 2017, blog.mapbox.com/using-maps-and-location- services-with-arkit-a1980903ca96. [12] Weng, Hanleyweng. “Hanleyweng/Gesture-Recognition-101-CoreML- ARKit.” GitHub, 11 Dec. 2017, github.com/hanleyweng/Gesture-Recognition- 101-CoreML-ARKit. [13] “Visual Intelligence Made Easy.” Custom Vision - Home, www.customvision.ai/. 23