ELEANOR ROOSEVELT AR
A Thesis
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
MSc.
by
Zhiling Hu, Ananya Paul, Meera Nanda
May 2020
©c 2020 Zhiling Hu, Ananya Paul, Meera Nanda
ALL RIGHTS RESERVED
Figure 1: Eleanor Roosevelt AR as designed by Michael Byrne
ABSTRACT
For this project, we chose to focus on the absence of Eleanor Roosevelt (ER) at
Four Freedoms Park (FFP) in New York City, despite her active political partic-
ipation during Franklin D. Roosevelt’s presidency. Our goal was to communi-
cate this participation using Augmented Reality (AR) and the power of dance.
We conducted various forms of research in order to determine the best way to
approach this and found that an interactive mobile AR app was the best way to
do this so that each user was able to have their own experience. Our experimen-
tation with AR also helped assess the limitations of what can be implemented
within this time frame.
CHAPTER 1
INTRODUCTION
Eleanor Roosevelt, the former First Lady of the United States, was an extraordi-
nary voice for social justice and peace. Her legacy serves as a conduit to high-
light the legacies of other trailblazing women from history. For example, at the
invitation of ER, Martha Graham, a world leader in the evolving art form of
modern dance, performed at the White House some of her Americana pieces in
1937, which indicates ER’s belief in using dance as a form of diplomacy.
This interdisciplinary project aims to remedy the absence of ER on Roosevelt
Island using augmented reality. We aim to explore the embodied performance
of dance at Roosevelt Island’s Four Freedoms Park that brings to life the charac-
ter and works of ER (and potentially contributions of other pioneering women).
We hope to collaborate with curators and choreographers in creating engaging
content that celebrates Eleanor’s 1948 co-authoring of the UN’s Universal Dec-
laration of Human Rights.
We created a series of research questions to help us tackle this project, varied
between embodying historical values and upgrading the passive visitor experi-
ence into an active engagement through the design of open-ended interactions.
• How can we bring FFP to life with history, dance and women empower-
ment using technology such as Augmented Reality?
• Can we use historic buildings that have specific features, to project AR?
• How can we create a valuable experience for visitors, both physically and
virtually, which prompts an exchange for the work?
1
Figure 2.1: Capture from Dust
CHAPTER 2
RELATED WORK
For our research we chose to start by looking into AR products and technology
that exist in the realms of dance, women empowerment, museums and legal
aspects. (see [1] in References for a more detailed research paper).
2.1 Dance
There is extensive research in the realm of dance performance, largely sup-
ported by emerging technologies such as motion tracking cameras. We found
projects such as “Dust” [2] (Figure 2.1) which uses a VR headset to place the au-
dience in the immediate presence of the dancer with Kinect depth sensors. We
also saw “Make the Line Dance” [3] (Figure 2.2) which uses Kinect to track hu-
man skeletons then maps a video layer over a human body with QC, MadMap-
per and MaxForLive.
2
Figure 2.2: Capture from Make the Line Dance
Figure 2.3: Harriet Tubman on $1 bill from ”Notable Women”
2.2 Women Empowerment
Even though women represent half the workforce, there are much fewer immer-
sive experiences in this realm. “Notable Women” [4] (Figure 2.3) is an AR app
that adds 100 prominent American women to US currency. “The Whole Story”
is another project which allows users to add AR statues of women through Cen-
tral Park.
2.3 Museums and Heritage
AR is a prominent emerging technology used in several museum installations,
exploring ways to enliven archival exhibits. The “Skin and Bones” [5] (Figure
2.4) exhibit at the Smithsonian Museum is an AR app that when focused on
3
Figure 2.4: Capture from Skin and Bones
bones of prehistoric animals on display, brings them to life. “MoMAR” [6] is
another useful open source project that makes paintings at the MoMA interac-
tive to users.
2.4 Legal
Laws around AR and art are currently very fuzzy. Copyright laws may apply
when presenting other artists’ work in alternate locations without explicit per-
mission. There are also potential issues with the Visual Artists Rights Act when
it comes to augmenting artists’ work. Concepts such as “virtual trespassing”
have been introduced recently due to games such as Pokémon Go interfering
with people’s personal spaces without permission.
CHAPTER 3
[APPROACH — METHOD — IMPLEMENTATION ]
We approached this project as if we were building a consumer product, and
therefore went through several stages of research and design before we began
the technical implementation, as shown below.
4
3.1 Research
3.1.1 Market Research
We began this part of our research by exploring existing work in the AR uni-
verse. The initial investigation was online, and all relevant articles and projects
were compiled into a consolidated research document that can be found at the
end of this report. The next step we took was to reach out to people in the AR
industry whom we found while conducting the above research, to learn more
about their projects in order to better further our own. This research also in-
cluded a deep dive into the technical capabilities of augmented reality. We ex-
plored the opportunities for development within video see through AR, spatial
projection and optical see through AR.
3.1.2 User Research
We started with a survey to assess how locals and tourists go about exploring
New York City and better understand what types of exhibits draw the biggest
crowds. We then conducted contextual inquiries at Four Freedoms Park, the
location for the launch of our product. We also met with the staff at FFP to learn
about visitor behavior.
5
3.1.3 Location Study
In a similar fashion, we conducted observational studies at FFP, the intended
venue for our project launch, to see how users interact with the space there. We
also met with the park staff, Madeline Grimes, to better understand the space.
Through this, we were able to gather extremely useful information regarding
peak visitation times, how we were allowed to use and interact with the space,
what modifications we were allowed to make, and also received schematics for
the park to help us better design our product.
3.1.4 Experiments
We discovered from our contextual inquiries that visitors of FFP tend to take
a lot of pictures on their phones, which are mostly of the city skyline, directed
outwards. For visitors who are from New York City, they come to relax and for
leisure. For tourists who are visiting, they primarily visit for a different view of
the city. Visitors take photos with their phones as well as stand-alone cameras.
From talking to the staff at the park we also learned that visitors mostly visit
in the afternoon. There are sometimes events organized by third parties hosted
in the evening after hours or during the day. The chart (Appendix 1) gives a
distribution of cumulative footfall spread out over the months from 2013-2019
as provided by the FFP Authorities. Talking to the authorities we also learnt
that they are welcoming and open to installations that we might need for our
project.
Once we had settled on our final product idea, the next step was to create ex-
6
periments to test the feasibility. We decided on the following three experiments:
• Can we detect a vertical plane and project a rectangle onto it?
• Can we detect the United Nations (UN) building from FFP and project a
rectangle onto it?
• Can we use gesture recognition to trigger a projection?
We each went through various tutorials and trials to get results for these, which
are detailed in the next section.
Experiment 1: Can we detect a vertical plane and project a rectangle onto
it?
We used a tutorial by Next Reality [7] to first identify a vertical plane. The
resulting image of the plane (Figure 3.1) was done primarily in XCode using
ARKit. [8] We found that we were able to identify salient vertical planes where
the edges are clear, such as in the image above. However, in directing the
phone’s camera to a wall where the edges were not clear, it did not work as
effectively. This may not prove to be a problem for us because the panels we
intend to use at the park do have clearly marked edges. We ended up nicely
projecting a demo poster onto the plane (Figure 3.2). What’s interesting to note
here as well is that the angle of the plane is correctly identified, and the pro-
jected image is also angled to match.
Experiment 2: Can we detect the UN building from FFP and project a rect-
angle onto it?
7
Figure 3.1: Plane detection
Figure 3.2: Poster placed on plane
We first tested the exiting ARKit and explored the fact that maximum dis-
tance capture is around 1.5 meters. Therefore, we realized we needed to make
a more exploratory attempt to do ”world scale” AR. After research, we moved
forward with Mapbox library [9], aiming to inform ARKit of global position-
ing using location data from the phone. This allows items to be placed within
the AR world using real-world coordinates. We checked out the new Mapbox
iOS + ARKit library [10] and referred to the beginner’s tutorial. [11] The demo
combines Mapbox maps and location services with ARKit in the application.
Objects are now anchored to GPS coordinates so that Augmented Reality ex-
periences can work outside over long distances instead of just within a limited
distance around. Limited to the weather conditions, we only managed to test it
indoors. (Figure 3.3)
8
Figure 3.3: Location detection
Experiment 3: Can we use gesture recognition to trigger a projection?
We followed a tutorial [12] for gesture recognition in ARKit and CoreML.
We decided on a set of gestures for our experiment and collected a total of 255
images for each gesture under different lights and background. Then we trained
the images in Microsoft’s Custom Vision [13] and achieved 100% precision. We
tested out the model to see if a gesture is being recognized and achieved a 100%
prediction accuracy for our test (Figure 3.4).
We then wanted to build the Unity application to launch in mobile. Due to
lack of resources, we could not test with an iPhone and therefore tested with an
Android application. Following SJ-Wolf’s tutorial [14], we were able to correctly
detect objects with their given dataset. (Figure 3.5).
However, we were not able to detect gestures with our TensorFlow model
that we trained previously with Custom Vision. We want to try gesture detec-
9
Figure 3.4: Prediction accuracy
Figure 3.5: Object recognition
tion over the next semester with ARKit once we have the necessary resources,
and hope to get gesture recognition work with ARKit.
10
3.2 Implementation
3.2.1 Design
We conducted an internal design sprint to finalize our product requirements
and functionality. After going through individual brainstorming sessions, the
three of us came up with approximately 15 unique ideas each. We then created
an affinity diagram to organize our ideas and narrowed it down to our top three.
We created thorough pitches of each idea to flesh out the features we could
include and were able to then select the best product idea.
The goal was to incorporate the spirit of Eleanor Roosevelt and the 19 Ges-
tures of Empowerment created by the Martha Graham Dance Company. Keep-
ing this in mind, we were able to narrow it down to two final ideas. The first
involved using the panels at FFP to each represent playing cards, and the user
would have to ’flip’ each one over by tapping them in order to see one of 19 ges-
tures of empowerment. They would then have to find the matching card and
gesture to unlock more information. The workflow for this is showing in Figure
3.6. The second idea involved spawning multiple pins within FFP for the user
to find, each one showing and animation of one of the 19 gestures when tapped
on. The workflow for this is shown in Figure 3.7.
For each of these we created further high fidelity designs within Figma to
better visualize how these would work. After several rounds of testing and
experiments, we determined that the scavenger hunt model would be a lot eas-
ier to implement given the constraints of the park. These are further described
within the limitations section of this paper.
11
Figure 3.6: Card Matching
Figure 3.7: Scavenger Hunt
12
While finalizing this design, we played around with different color schemes
and UI options. We finally decided on using a color scheme that resembled
that of the Smithsonian Museum, since one of the eventual goals of this project
was to have it displayed in collaboration with their team. This color scheme
involved use of yellows and blues of varying brightness. Since the name of
the product is My Day, this color scheme fit our narrative perfectly (Figure 3.8,
Figure 3.9).
3.2.2 Technical Implementation
Unity
Our Augmented Reality application is built in Unity with the added packages
of ARCore and scripts written in C#. Our Unity application is built in Version
2018.3.13f. Our main scene has first person camera for AR, point cloud system,
Event systems, Plane generator, environmental light and 3D objects of rotating
diamonds that are scripted to spawn 3D animated models of gestures created in
Maya when touched on it.
We are using a scene controller script to map gestures with individual dia-
monds, which are rotating in space on it’s own axis in a certain velocity con-
trolled by a C# script. The diamonds are rigid bodies that uses raycasting col-
lision to identify a diamond and spawn a human model. We are using a plane
generator to give the impression of the model on the ground instead of it float-
ing in the air.
Every model has a Unity Animator that goes from Entry state to Anim state
¯ ¯
13
Figure 3.8: Home Screen
Figure 3.9: Model Screen
which has motion attached to it specific to that model and gesture. The models
are instantiated from their prefabs in the same location as the diamond in xyz
space, scaled to imitate human scale and rotated depending from where you
touch the diamond. We dynamically instantiate the pose body and also the
animator controller is loaded dynamically and attached to the pose body.
14
Unity has the location component integrated that we have discussed in the
later section.
Gesture Recognition
Our initial idea of interaction was around gesture recognition. Since our appli-
cation revolves primarily around dance, it made sense to experiment with ges-
ture as the primary interaction with the application. We started our experiment
with Microsoft’s Computer Vision API. We took 100 pictures of 3 hand gestures
across the Cornell Tech campus in different light and background settings and
trained a model to identify each gesture to spawn a model. The model was
not able to identify gestures picture the training was with not enough images.
We then tried to do object recognition with known objects which could identify
with about 50% confidence. Since data generation is a difficult problem, and the
confidence is not promising, we did not go ahead with this approach.
Then we experimented with media pipe, even though media pipe was able
to dynamically identify gestures, it did not have support to integrate with Unity
and which is why we did not go ahead with it.
We also explored other open source OpenCV package implementation for
gesture recognition and even though we were able to count the number of open
fingers in a hand (Figure 3.10), we were limited by the number of closed ges-
tures that we could identify. We were not able to identify complex gestures
with OpenCV. There exists OpenCV-for-unity package that is used to integrate
OpenCV with Unity, but due to budget constraints we were not able to buy
that library, and writing a library for that purpose was beyond the scope of our
15
Figure 3.10: Finger Count with OpenCV
project. Because of the above limitations in accuracy, user interaction, gesture
learning growth curve and lack of funding we decided to not go ahead with
gesture recognition.
Geo-location
The project is an outdoor AR application. Users will be prompted to confirm
if they arrive at the targeted location. So the first step is to get the device geo-
coordinates and check if it’s roughly within the goal. The built-in GPS sensor
returns the latitude and longitude of the device (camera). We used Unity3D
engine and perceived location service input to determine the user’s real-time
location.
In the controller script, once the application is open, code will check with
user’s device for current service status. With certain calls in the code, Android
devices will automatically add location permission to the android manifest.
16
Figure 3.11: Pop-up window
User will receive a pop-up window from the device prompting to enable the
location service. Once approved, user’s location will be retrieved and used for
checking if the user reaches the target location.
We also implemented methods for calculating distance in meters between
two geo-coordinates. Calling them will offer us a direct metric to get the idea of
how close is the user from the target. Noticeably, devices have a limit on GPS
accuracy. So we manually set a threshold of 10 meters here. This is to say, if
users have reached no more than 10 meters from the target location, the device
will pop up a window confirming users have reached FFP (Figure 3.11).
17
Maya
Since our team was not familiar with animation and 3D modeling, we con-
ducted a lot of research to see which platforms would be the quickest and most
beneficial for us to learn over the course of the semester. We found that Maya
had a similar learning curve as some of the other freeware, and since it is the
industry standard in animation and we were able to obtain a student license,
we decided to move forward with it.
We began by going through multiple tutorials each to familiarize ourselves
with the platform. Once we felt comfortable to begin animation, we found a
fitting mesh that represented the female form to begin modeling each of the
poses. In order to do this, we had to first ’rig’ the mesh. This means we had to
add a skeleton to the female model in order to move each of the joints into the
correct pose. This proved to be trickier than we expected. Once the joints had
been marked correctly onto the body, we began moving the limbs to mimic each
of the 19 poses, keeping in mind that this was a 3D model. This meant we had
to keep switching views of the model to ensure that the pose did in fact appear
correctly from all sides, and not just from the front where the appearance could
be deceiving.
From here we began working with the animation tools within the platform.
In order to do this, we had first set an initial number of frames for each pose, in
our case this was set to 200. Then at frame 0, all of the joints and limbs had to
be set to be in a neutral position. from here, each joint was moved individually
to their final position within a range of frames to ensure the movements from
neutral to pose appeared smoothly. In some cases, especially in poses using
kicks, this involved saving the joints movements within multiple frames so that
18
the move looked more realistic and did not appear as if the model had contorted
in order to arrive at the final position. Once this was complete, we put together
a much longer animation showcasing all the poses along with improvisations
in order to present the gestures as a coherent routine. This was created using
choreography sent to us by the Martha Graham Dance Company.
CHAPTER 4
RESULTS
The final result of this project was a working mobile AR application built on
an Android phone using AR Core and Unity. The app works by detecting the
user’s location, and upon their arrival at FFP, spawns 3D models representing
pins across the park. The users can then tap on each pin to bring to life each of
the animations of the 19 gestures. The final app design currently resembles that
of a sculpture park, allowing the user to then walk through all of the final poses
and explore further. Linked is the youtube URL demo of our project result.
https://youtu.be/WIza3Qzefac
CHAPTER 5
DISCUSSION
As we explored opportunities within AR in our initial market research
phase, we found that while spatial projection would allow for a multi-user expe-
rience, having the equipment on site at FFP was simply not feasible, and would
only allow for a brief installation rather than a permanent experience. We were
limited in the world of optical see through AR as well because this would re-
quire us to have multiple headsets available to users as they went through the
19
exhibit. Overall, our research found that using video see through AR through a
mobile device was the most easily accessible experience for all users.
While our initial idea involving playing cards on the panels sounded a lot
more interactive than the current product, our discussions and research with
users showed that this might not be the most effective way to present the ma-
terials. We were concerned that the matching process between the cards could
cause frustration and fatigue from the users, and cause them to abandon the
application. We also were concerned about the capabilities of correctly placing
and presenting the playing cards to the users based on the results of our exper-
iments.
Another reason we chose to move forward with the scavenger hunt model
instead was to allow for the same concept to be applied in different parts of the
city, or in different cities. In using the playing cards model, we were limiting
the scope of the project for future iterations to only take place within FFP or
similarly paneled locations.
We also found that during the technical implementation phase, debugging
proved to be quite difficult, and significantly slowed our progress towards the
final product. We also saw that while we had designs in mind, we prioritized
having the technical elements working correctly before moving forward with
UI implementations.
5.1 Limitations
One of the most elusive pieces of AR application is occlusion. In other words,
the ability to hide virtual objects behind real things. This concern also applies
20
to our project. Currently, some virtual objects that are behind a real object is not
hidden behind that real object.
There are mainly two solutions for the creation of augmented reality appli-
cations, ARKit and ARCore. Our team is composed of developers used to both
platforms. ARKit is only compatible with iPads or iPhones on a base of iOS 11
and higher. ARCore, on the other hand, will run on portable devices on a base
of Android 7.0 Nougat and higher. Both AR platforms have similar capabilities.
Since the number of Android users is much higher than iOS users, we decided
to move forward with ARCore. As a result, the current project is only built to
Android devices.
5.2 Future work
As part of our future work, we would like to create a much more thorough user
interface for the users to have more control over the experience. With more
funding we would like to integrate packages like OpenCV with this project and
be able to make more use of computer vision packages for Unity. We would also
like to work more on the occlusion properties for the placement of these models,
since this is one of the more prominent limitations we ran into while building
the product.
In building out these new UI features, we would like to focus on creating
some textual elements as well that would be more direct in showcasing the work
of Eleanor Roosevelt. In its current form, this representation is abstract, but
users may benefit from more explicit information in relation to the park.
21
Another idea we would like to push forward for this product is to build this
product as a framework that can be easily adapted to different parts of New
York City, to showcase the work of other choreographers and notable women
through out history. This model is flexible in that sense, and if we create the
foundation for people to simply import new models and location boundaries,
we could use this product to enhance historical experiences around the world.
CHAPTER 6
CONCLUSION
While we began this project with emphasis on Four Freedoms Park, we pro-
gressed towards making this more accessible in different locations. The 19 Ges-
tures of Empowerment by the Martha Graham Dance Company were very ef-
fective in communicating a much larger need for feminism, and the presence
of it within locations such as FFP. Overall, this project taught us valuable skills
in augmented reality, 3D modeling and location services across different plat-
forms. Our experimentation in particular helped us assess the limitations of all
the platforms we were working within. We were also able to practice many new
skills with user research and design.
BIBLIOGRAPHY
[1] Consolidated Market Research for Eleanor Roosevelt AR, https://drive.google.com
/open?id=10f50KU8ocMLBtn5ADLUq6mYmh25VN6ofEQmfzEe9StY
[2] “Dust.” Dust, www.vrdust.org.uk/.
[3] “Make the Line Dance.” Vimeo, 1024, 21 Mar. 2011, www.vimeo.com/21308228.
22
[4] “Notable Women - About.” Google, www.notablewomen.withgoogle.com/about.
[5] “Bone Hall.” North American Mammals: Sciurus Niger : Species Infor-
mation — S299,Museumo f NaturalHistory, http : //naturalhistory.si.edu/exhibits/bone−
hall.
[6] “MoMAR.” MoMAR, momar.gallery/exhibitions/werefromtheinternet.html.
[7] Punn, Ambuj. “ARKit 101: How to Place 2D Images, Like a Paint-
ing or Photo, on a Wall in Augmented Reality.” Next Reality, Next Reality,
24 Oct. 2018, mobile-ar.reality.news/how-to/arkit-101-place-2d-images-like-
painting-photo-wall-augmented-reality-0187598/.
[8] Apple Inc. “ARKit - Apple Developer.” ARKit - Apple Developer, devel-
oper.apple.com/arkit/.
[9] “Documentation.” Mapbox, docs.mapbox.com/.
[10] Mapbox. “Mapbox/Mapbox-Arkit-Ios.” GitHub, 25 Oct. 2017,
github.com/mapbox/mapbox-arkit-ios.
[11] “Using Maps and Location Services with ARKit.” Points of Interest,
Points of Interest, 9 Aug. 2017, blog.mapbox.com/using-maps-and-location-
services-with-arkit-a1980903ca96.
[12] Weng, Hanleyweng. “Hanleyweng/Gesture-Recognition-101-CoreML-
ARKit.” GitHub, 11 Dec. 2017, github.com/hanleyweng/Gesture-Recognition-
101-CoreML-ARKit.
[13] “Visual Intelligence Made Easy.” Custom Vision - Home, www.customvision.ai/.
23