A Break from Reality: Modernizing Authentication Standards for Digital Video Evidence in the Era of Deepfakes
69 Am. U. L. Rev. 1945 (2020).
* Senior Staff Member, American University Law Review, Volume 70; J.D. Candidate, May 2021, American University Washington College of Law; B.A. History, 2009, Princeton University. I would like to thank the Law Review staff for their tireless assistance with this piece and my family of law school friends who have made this law school experience irreplaceable. Finally, I am forever grateful to my parents, who have never failed to lead the way.
The legal standard for authenticating photographic and video evidence in court has remained largely static throughout the evolution of media technology in the twentieth century. The advent of “deepfakes,” or fake videos created using artificial intelligence programming, renders outdated many of the assumptions that the Federal Rules of Evidence are built upon.
Rule 901(b)(1) provides a means to authenticate evidence through the testimony of a “witness with knowledge.” Courts commonly admit photographic and video evidence by using the “fair and accurate portrayal” standard to meet this Rule’s intent. This standard sets an extremely low bar—the witness need only testify that the depiction is a fair and accurate portrayal of her knowledge of the scene. In many cases, proponents’ ability to easily clear this hurdle does not raise concerns because courts rely on expert witnesses to root out fraudulent evidence; thus, although the fraudulent evidence might pass the fair and accurate portrayal standard, it would later be debunked in court.
The proliferation of deepfakes severely complicates the assumption that technological experts will be able to reliably determine real from fake. Although various organizations are actively devising means to detect deepfakes, the continued proliferation and sophistication of deepfakes will make debunking fake video more challenging than ever. Witnesses who attest to the fair and accurate portrayal standard will likely not be able to identify subtle but important alterations in deepfakes. As a result, fraudulent evidence, authenticated through the Rule 901(b)(1) standard, will increasingly enter courtrooms with a decreasing ability for witnesses and courts to identify fakes. Because the technology to detect deepfakes lags behind the creation methods, deepfakes present a critical threat to courtroom integrity under the current standard.
The rising probability that juries see fake videos warrants a higher burden on the proponent of video evidence. Requiring additional circumstantial evidence to corroborate video evidence is a small but crucial step that will mitigate, but not solve, the coming deepfakes crisis. Further engagement around this topic is necessary to address the deepfakes crisis before it creates irreparable harm.
“[R]eality is not external. Reality exists in the human mind, and nowhere else.”
—George Orwell1George Orwell, 1984 249 (New American Library ed. 1961) (1949).
Artificial intelligence and machine learning have enabled unprecedented leaps in mankind’s capability to solve the most pressing issues of the twenty-first century.2Machine learning is a subset of the broader application of artificial intelligence. While machine learning takes many forms, the “core notion is that the machine would be able to take data and learn . . . without human intervention.” Vijay Singh, What Is the Difference Between Machine Learning and Artificial Intelligence?, Data Sci. Cent. Blog (Sept. 22, 2018, 9:00 PM), https://www.datasciencecentral.com/profiles/ blogs/what-is-the-difference-between-machine-learning-and-artificial [https://perma.cc/XAG7-XBX2]. Programmers and doctors have worked together to create artificially intelligent programs that synthesize data from millions of patients to diagnose illness with greater precision and speed than ever before.3Donna Marbury, How Health Systems Are Using AI and Future Predictions, Managed Healthcare Executive (Aug. 8, 2018), https://www.managedhealth careexecutive.com/article/how-health-systems-are-using-ai-and-future-predictions [https://perma.cc/QJ6P-RC53]; New AI Model Tries to Synthesize Patient Data like Doctors Do, Pac. Northwest Nat’l Laboratory (Nov. 12, 2019), https://www.pnnl.gov/news-media/new-ai-model-tries-synthesize-patient-data-doctors-do [https://perma.cc/J7FW-PPX6]; see Emily Mullin, FDA Approves AI-Powered Diagnostic that Doesn’t Need a Doctor’s Help, MIT Tech. Rev. (Apr. 11, 2018), https://www.tech nologyreview.com/f/610853/fda-approves-first-ai-powered-diagnostic-that-doesnt-need-a-doctors-help [https://perma.cc/J4XT-DWDP] (providing an example of diagnostic software that detects illness using patient data). Soon, self-driving cars will relieve humans of the deadliest threat on our highways (ourselves).4Suhasini Gadam, Artificial Intelligence and Autonomous Vehicles, Medium (Apr. 19, 2018), https://medium.com/datadriveninvestor/artificial-intelligence-and-autonom ous-vehicles-ae877feb6cd2 [https://perma.cc/L4JN-X34J]. However, notwithstanding the tremendous promise of improvement that artificial intelligence brings to our world, future generations may someday remember December 2017 as a seminal moment of the digital age that exposed the danger of advanced technological capabilities. As an internet technology website, Motherboard, first reported with great despair, in December 2017, a Reddit user with the online handle “deepfakes” created a series of videos utilizing new techniques that grafted the faces of several well-known actresses into pornographic videos.5Samantha Cole, AI-Assisted Fake Porn Is Here and We’re All Fucked, Vice (Dec. 11, 2017, 2:18 PM), https://www.vice.com/en_us/article/gydydm/gal-gadot-fake-ai-porn [https://perma.cc/AUR4-W36D]. Reddit, along with several pornographic websites, quickly featured explicit videos in which Daisy Ridley, Gal Gadot, and other actresses had never actually appeared.6Samantha Cole, We Are Truly Fucked: Everyone Is Making AI-Generated Fake Porn Now, Vice (Jan. 24, 2018, 1:13 PM), https://www.vice.com/en_us/article/bjye8a/ reddit-fake-porn-app-daisy-ridley [https://perma.cc/P2D4-GASP].
The level of sophistication of this technology was still blossoming; Motherboard reported that “[i]t’s not going to fool anyone who looks closely. Sometimes the face doesn’t track correctly and there’s an uncanny valley effect at play, but at a glance it seems believable.”7Cole, supra note 5. Japanese roboticist Masahiro Mori coined the concept “uncanny valley,” used to describe a psychological phenomenon that occurs as a robot or android’s visual resemblance to the human likeness improves; our subconscious enjoyment of the visual experience increases until the robot’s likeness reaches a certain level of sophistication, at which point many feel “repulsive affects” that some describe as “creepy” or “eerie.” Shensheng Wang, Scott O. Lilienfeld & Philippe Rochat, The Uncanny Valley: Existence and Explanations, 19 Rev. Gen. Psychol. 393, 393, 396 (2015). However, over the past several years, “deepfakes”—colloquially named after the otherwise unidentified Reddit user who circulated fake pornographic videos—have evolved from videos whose alterations are reasonably discernible by the naked eye to fakes that are challenging for both the human eye and machine detection software to distinguish from real videos.8Editorial Board, A Reason to Despair About the Digital Future: Deepfakes, Wash. Post (Jan. 6, 2019, 7:10 PM), https://www.washingtonpost.com/opinions/a-reason-to-despair-about-the-digital-future-deepfakes/2019/01/06/7c5e82ea-0ed2-11e9-831f-3aa2c2be4cbd_story.html?utm_term=.f4f9e1e7b293 (“Deepfakes are also inherently hard to detect. The technology used to create them is trained in part with the same algorithms that distinguish fake content from real—so any strides in ferreting out false content will soon be weaponized to make that content more convincing.”). This progression is predominantly due to the advancement of processes for creating deepfakes that use machine learning programs to continuously improve the fidelity of the videos and render increasingly lifelike representations.9See infra Part I.A.
The coming proliferation of deepfakes has created no shortage of alarms in the legal, political, and social spheres, in which scholars predict countless challenges to organized society, ranging from celebrity harassment to political and governmental manipulation.10See, e.g., Hallie Jackson, Fake Obama Warning About ‘Deep Fakes’ Goes Viral, MSNBC (Apr. 19, 2018), https://www.msnbc.com/hallie-jackson/watch/fake-obama-warning-about-deep-fakes-goes-viral-1214598723984 (highlighting director Jordan Peele’s effort to educate the public about deepfakes by creating a realistic fake video of Barack Obama). Some scholars have already rushed to address regulatory challenges that deepfakes pose and identify civil remedies for victims of deepfake videos.11See, e.g., Elizabeth Caldera, Comment, “Reject the Evidence of Your Eyes and Ears”: Deepfakes and the Law of Virtual Replicants, 50 Seton Hall L. Rev. 177, 178 (2019) (arguing that the Federal Trade Commission is the best choice among administrative agencies to regulate deepfake technology); Douglas Harris, Deepfakes: False Pornography Is Here and the Law Cannot Protect You, 17 Duke L. & Tech. Rev. 99, 102–03 (2019) (arguing that a federal criminal law prohibiting fake pornographic videos is necessary to address deepfakes because state tort and non-consensual pornography laws are insufficient); Russell Spivak, “Deepfakes”: The Newest Way to Commit One of the Oldest Crimes, 3 Geo. L. Tech. Rev. 339, 340–41 (2019) (examining whether various state defamation or privacy tort causes of action are viable remedies or if they conflict with First Amendment protections). For example, many state privacy torts do not account for artificial rather than actual depictions of the victim,12See Danielle Keats Citron, Sexual Privacy, 128 Yale L.J. 1870, 1921–24, 1939 (2019) (highlighting the disconnect between privacy torts and pornographic deepfake videos because the fake video represents no physical intrusion or truthful, private facts). and First Amendment precedent is ill-equipped to deal with the expression of non-obscene but nonetheless manipulative fake videos.13See Spivak, supra note 11, at 358–64 (addressing the obscenity and child pornography exceptions to the First Amendment restraint on prohibiting a communication based on its content). However, despite some recognition that fake video is an imminent threat to courtroom integrity, lawmakers have done little to address the manner in which our evidentiary standards for authenticating photographic and video evidence must adapt to counter this threat.14See, e.g., Jeff Ward, 10 Things Judges Should Know About AI, 103 Judicature 12, 17 (2019) (positing that the risk to “fundamental civic institutions and processes” may be undermined if the “current rules of evidence do not keep pace with these advances”).
This Comment addresses the need for heightened evidentiary standards to counter the dangerous consequences of deepfakes, a need that is likely to become a central focus to our judicial process as prosecutors, plaintiffs, and defendants all turn to the courts to redress the threat and harms that deepfakes cause. Courts currently rely on an evidentiary standard that assumes authenticating witnesses have sufficient personal knowledge to attest to a photograph’s or video’s authenticity;15See Fed. R. Evid. 901(b)(1) (describing “[t]estimony that an item is what it is claimed to be” as sufficient to satisfy the authenticity requirement). this standard is now inadequate to meet the intent of the Federal Rules of Evidence. Recent amendments to the Federal Rules of Evidence in 2017 aimed to address the growing influx of electronic media, such as social media posts or websites, into courtrooms.16See Fed. R. Evid. 902(13)–(14) (declaring certified records and data as self-authenticating evidence, requiring “no extrinsic evidence of authenticity”). The amendments are “largely a reflection of the digital world in which we live.” Ramona L. Lampley, Something Old and Something New: Exploring the Recent Amendments to the Federal Rules of Evidence, 57 Washburn L.J. 519, 519–20 (2018) (providing a practical explanation and analysis of the impact of the 2017 amendments on Rules 803 and 902). However, the 2017 amendments did not replace or circumvent existing authentication requirements; instead, they allow the proponent of the evidence to offer authentication by certification rather than demanding witness testimony, which can be both costly and time-consuming.17Lampley, supra note 16, at 525. Since the 2017 amendments, deepfakes have burst into the national consciousness, and their potentially devastating consequences demand further examination into the authentication standard for photographic and video evidence. Ultimately, current authentication standards for photographs and video fail to account for the inability of witnesses, even those present at the scene depicted, to determine reality from forgery.
Part I of this Comment explores deepfake video creation and the unique difficulty in authenticating or debunking them. The novel creation process that utilizes machine learning networks not only enables extraordinarily high-fidelity forgeries but also severely complicates detection capabilities. Part I also introduces the psychological effect known as suggestibility, which makes deepfakes especially dangerous because of the human memory’s susceptibility to recall events that never happened, compounding the deepfakes problem. Part II outlines the current legal standard that courts use to lay a foundation for the authenticity of video evidence to satisfy the requirement of Rule 901(a) of the Federal Rules of Evidence, primarily through Rule 901(b)(1) or Rule 901(b)(9).
Part III argues that, because of the high fidelity of deepfakes, witnesses no longer meet the recollection element of the personal knowledge standard established by Rule 602 to act as a witness with knowledge to testify that a video is a fair and accurate portrayal of a scene. Witnesses can only attest to the fair and accurate portrayal standard by augmenting their recollection with speculation, and because of the psychological effects of suggestibility, are likely to believe the gaps that their memories have filled. Combined with the conflation of illustrative and substantive evidence that photography and video creates, courts are likely to admit substantive evidence for a jury to consider under far lower standards than Rule 901(a) intended. This Comment recommends a new addition to Rule 901 to establish a foundation of authenticity outside of the presence of the jury to mitigate the risk of unfair prejudice. This recommendation aims to alleviate the problem of deepfakes in the courtroom but admittedly does not solve the problem entirely.
Lastly, this Comment concludes that the current legal standard for establishing a foundation to authenticate videos fails to meet the original intent behind the evidence rules of authentication in light of new and continuously developing photographic and video technology. Transitioning to a heightened evidentiary standard is necessary to anticipate the upcoming deepfakes crisis in our courtrooms, rather than reacting to it as the technology permeates our society.
I. Deepfakes Background
Anyone remotely familiar with graphic design can attest to the relative ease with which various programs, such as Adobe Photoshop, can modify digital images. In fact, a post on Adobe’s blog “Adobe Life” invites Photoshop users to “reimagine reality.”18How Our Photoshop Floor Reimagines Reality, Adobe Life Blog (Apr. 4, 2018), https://blogs.adobe.com/adobelife/2018/04/04/adobe-photoshop-floor [https://perma.cc/GCZ7-3GXY]. The technology behind deepfakes, however, elevates this ability to a level previously unreachable for mainstream graphic design programs. Understanding how deepfakes technology ushers in a new era of manipulation requires grasping two concepts: first, how the creators use machine learning algorithms to generate videos with human likenesses at unprecedented levels of fidelity, and second, how this creation process frustrates current methods of determining real from fake.
A. Deepfakes Creation Through Generative Adversarial Net Machine Learning Cycles
The use of advanced machine learning techniques to create fake videos burst onto the scene in December 2017.19See Cole, supra note 5 (describing the fake videos that first introduced the world to the concept of “deepfakes”). The near-apocalyptic journalism that followed Motherboard’s exposure of the exploits of the “deepfakes” user on Reddit quickly caught the attention of technology commentators,20See, e.g., Karen Hao, Deepfakes Have Got Congress Panicking. This Is What It Needs to Do, MIT Tech. Rev. (June 12, 2019), https://www.technologyreview.com/s/ 613676/deepfakes-ai-congress-politics-election-facebook-social [https://perma.cc/AN78-4BFC] (explaining Congress’s early efforts to draft a deepfakes regulation bill to “spark a more nuanced conversation” rather than to actually pass the bill into law). mainstream news outlets,21See, e.g., Kevin Roose, Here Come the Fake Videos, Too, N.Y. Times (Mar. 4, 2018), https://www.nytimes.com/2018/03/04/technology/fake-videos-deepfakes.html [https://perma.cc/JBT8-F2LH] (highlighting deepfakes’ potential to be wielded as an ideological tool). and the government.22See, e.g., Deepfakes Report Act of 2019, H.R. 3600, 116th Cong. (2019) (requiring “the Secretary of Homeland Security to publish an annual report” on deepfakes and other digital forgery technology). Although the concept of doctoring digital photography (or other evidence, for that matter) is not new,23See David Levi Strauss, Doctored Photos–The Art of the Altered Image, TIME (June 13, 2011), https://time.com/3778075/doctored-photos-the-art-of-the-altered-image [https://perma.cc/4LQ8-CHFG] (demonstrating the existence of doctored photography since at least 2011). the budding creation process behind deepfakes enables creators to mimic reality in a devastatingly realistic fashion.
At the core of this new technology is a process called a generative adversarial net (GAN). University of Montreal Ph.D. student Ian Goodfellow led a 2014 scientific paper that first introduced GAN models.24Ian J. Goodfellow et al., Generative Adversarial Nets, arXiv (June 10, 2014), https://arxiv.org/pdf/1406.2661.pdf [https://perma.cc/ME86-SN74]. Although Goodfellow, now a research scientist at Google’s “Brain Team,” coined the modern term “GAN” and is credited with materializing GAN coding into reality, the idea of pitting machines against each other to learn has roots in the early years of computer programming. Martin Giles, The GANfather: The Man Who’s Given Machines the Gift of Imagination, MIT Tech. Rev. (Feb. 21, 2018), https://www.technologyreview. com/s/610253/the-ganfather-the-man-whos-given-machines-the-gift-of-imagination [https://perma.cc/4P5N-7AX6]; see also A.L. Samuel, Some Studies in Machine Learning Using the Game of Checkers, 3 IBM J. Res. & Dev. 210, 211 (1959) (exploring attempts to teach programs to play checkers strategically against one another in the early years of computer science growth). In the paper, the authors articulated a process in which two machine learning algorithms are simultaneously pitted against one another.25Chris Nicholson, A Beginner’s Guide to Generative Adversarial Networks (GANs), Pathmind, https://pathmind.com/wiki/generative-adversarial-network-gan [https://perma.cc/JEY9-K283]. One of these programs is a generative model that creates new data samples; the other, known as a discriminator model, evaluates this data against a training dataset for authenticity.26Id. “The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency.” Goodfellow et al., supra note 24, at 1. The discriminator model estimates the probability that the sample came from the generative model (a machine creation) or sample data (a real-world reference).27Nicholson, supra note 25. These models are known as neural networks because they mimic organic brain function, with interconnected nodes layered to process information far more vast and complex than traditional computer algorithms.28Chris Nicholson, A Beginner’s Guide to Neural Networks and Deep Learning, Pathmind, https://pathmind.com/wiki/neural-network [https://perma.cc/WXD6-Y5NS]. These two neural networks operate in a cyclical fashion and learn from each other—the generative model program is learning to create false data, and the discriminator model is learning to identify whether the data is artificial.29Nicholson, supra note 25. The result is a process by which each element of the GAN model learns the other’s methods in a “constant escalation”;30Id. the generative model constantly improves its ability to create data sets that have a lower probability of failing the detection algorithm as the discriminator model learns to keep up, a process that continuously improves the fidelity of the creation.31Id. This continuous process enables the generative model to build a dataset that avoids the pitfalls that would normally give away a fraud.32Spivak, supra note 11, at 343–44. Spivak provides a useful illustration of GAN models by applying the cyclical learning process to signature styles of famous authors. A GAN programmer could train a discriminator model to learn the styles of, for example, James Joyce to the point where it can identify the author’s prosaic style embedded within other textual samples. Id. The generative model then creates new data sets (new pages of prose) for the discriminator to attempt to determine whether the new data was written by the generative model or came from the actual library of James Joyce. Id. After the generative model reveals the discriminator model to be right or wrong, the two models repeat the process continuously, with the generator fixing its mistakes until the discriminator can no longer reliably predict the probability of creation versus original. See id. Programmers can apply the same process to depictions of human movements and human voice. Id. at 351.
There are countless commercial and consumer applications of GAN technology. Chris Nicholson33Chris Nicholson is the CEO of Pathmind Inc., a Silicon Valley artificial intelligence services provider. Pathmind, https://pathmind.com/about [https://perma.cc/9FY9-64SP] (last visited August 6, 2020). has aptly described the breadth of GAN’s incredible scientific potential, stating that “[GAN models] can learn to mimic any distribution of data. That is, GANs can be taught to create worlds eerily similar to our own in any domain: images, music, speech, prose. They are robot artists in a sense, and their output is impressive—poignant even.”34Nicholson, supra note 25. The artistic applications are endless. Some fields, such as the film industry, have already employed ultra-lifelike human likenesses using a variety of methods.35See, e.g., Rogue One: A Star Wars Story (Lucasfilm 2016). The film prominently features actor Peter Cushing in his 1977 role as Grand Moff Tarkin, twenty-two years after Cushing’s death in 1994. Jason Guerrasio, The Actor Behind the CGI Tarkin in ‘Rogue One’ Tells Us How He Created the Character, Bus. Insider (Jan. 9, 2017, 12:35 PM), https://www.businessinsider.com/cgi-moff-tarkin-rogue-one-guy-henry-2017-1 [https://perma.cc/WEA4-M6SB]. Industrial Light & Magic used the related but distinct technology of computer graphic imaging with motion capture dots techniques to recreate Cushing’s likeness. Id. Another example of the film industry’s use of GAN technology is Finding Jack. See Finding Jack (Magic City Films, forthcoming 2020). The film stars James Dean in a leading role sixty-four years after his death in a 1955 car crash. Jesse Damiani, James Dean to Be Digitally Resurrected to Appear in His Fourth Film, ‘Finding Jack’, Forbes (Nov. 7, 2019, 8:32 AM), https://www.forbes. com/sites/jessedamiani/2019/11/07/james-dean-to-be-digitally-resurrected-to-appear-in-his-fourth-film-finding-jack/#1d5fff933102 [https://perma.cc/7RCV-PUV4]. Researchers are also developing GAN technology for commercial purposes such as enabling shoppers to picture what an article of clothing looks like on a particular person (without the burden of actually trying it on)36Donggeun Yoo et al., Pixel-Level Domain Transfer, arXiv (Nov. 28, 2016), https://arxiv.org/pdf/1603.07442.pdf. or devising stronger encryption techniques to protect confidential information and communications online.37Martín Abadi & David G. Anderson, Learning to Protect Communications with Adversarial Neural Cryptography, arXiv (Oct. 21, 2016), https://arxiv.org/pdf/1610. 06918.pdf [https://perma.cc/H4GA-J23Q].
Naturally, as benign use of the technology spreads, the dark side of video manipulation is accelerating with equal speed as GAN modeling becomes more widely accessible to those with less noble intentions.38See Rory Cellan-Jones, Deepfakes Videos ‘Double in Nine Months’, BBC (Oct. 7, 2019), https://www.bbc.com/news/technology-49961089 [https://perma.cc/5WBG-UX93] (discussing a September 2019 study from cybersecurity company Deeptrace that found 14,698 deepfake videos online compared to only 7,964 in December 2018). Actor and director Jordan Peele created deepfake videos of Barack Obama making speeches that never happened to highlight their danger to civil society.39Kaylee Fagan, A Viral Video that Appeared to Show Obama Calling Trump a ‘Dips–’ Shows a Disturbing New Trend Called ‘Deepfakes’, Bus. Insider (Apr. 17, 2018, 4:48 PM), https://www.businessinsider.com/obama-deepfake-video-insulting-trump-2018-4 [https://perma.cc/BX63-RVNK]. Politicians are a natural target for deepfake creators because of the volume of publicly available photographs and videos of politicians for the creators to utilize. Malign creators, whether domestic or foreign, can use deepfakes to further drive America’s political polarization and create the sort of “dystopia” that Jordan Peele warned of in his message.40Roose, supra note 21 (predicting that “[p]eople will share them when they’re ideologically convenient and dismiss them when they’re not”).
Further, despite Reddit’s and several pornographic websites’ efforts to ban deepfake pornography,41Janko Roettgers, Reddit, Twitter Ban Deepfake Celebrity Porn Videos, Nasdaq (Feb. 7, 2018, 2:11 AM), https://www.nasdaq.com/articles/reddit-twitter-ban-deepfake-celebrity-porn-videos-2018-02-07. malicious actors can still create and distribute deepfake celebrity or otherwise nonconsensual pornographic material in other less regulated corners of the internet. As the software to create lifelike deepfakes proliferates, the degree of difficulty and the skill required to create such videos is dropping, leaving convincing and powerful weapons in the hands of a larger number and greater variety of malevolent actors.42See Larry N. Zimmerman, Cheap and Easily Manipulated Video, 87 J. Kan. B. Ass’n 20, 20 (2018) (comparing the dismissive attitudes following Hollywood’s video manipulations in the 1990s with the current reality of software that makes “face-swapping simple for anyone regardless of skill or equipment”).
B. The Challenge of Finding Reliable and Lasting Detection Methods
As GAN programming continues to develop and expand, the ability to detect deepfakes becomes increasingly important in a variety of disciplines. The challenge of reliably and consistently detecting deepfakes further evinces the new era of digital forgery that they have ushered in. The challenge stems from the constantly evolving and cyclical method of deepfake creation.43See infra Part I.A. The very process that programmers use to create deepfakes relies on incorporating algorithms designed to detect the subsets of data that do not match sample data sets provided to the discriminator model; this cycle’s purpose is to root out inconsistencies.44Will Knight, The US Military Is Funding an Effort to Catch Deepfakes and Other AI Trickery, MIT Tech. Rev. (May 23, 2018), https://www.technologyreview.com/ s/611146/the-us-military-is-funding-an-effort-to-catch-deepfakes-and-other-ai-trickery [https://perma.cc/X2NP-AVKX]. This process therefore features a unique defense against programs that detect the frauds—any time a new method of determining whether a video is fake emerges, deepfake creators can use that to their advantage in the GAN cycle.45Nicholson, supra note 25 (comparing this process to the “game of cat and mouse” between a police officer learning to detect false notes and a counterfeiter improving her ability to pass false notes by learning the police officer’s methods).
For example, Associate Professor of Computer Science Siwei Lyu of the University at Albany conducted a study in 201846Yuezun Li, Ming-Ching Chang & Siwei Lyu, In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking, arXiv (June 11, 2018), https://arxiv.org/pdf/1806.02877v2.pdf [https://perma.cc/KB8N-YZGT]. on the then-current state of deepfake technology with the intent of attempting to pinpoint the reason that the fake videos “felt eerie to him, and not just because he knew they [had] been ginned up.”47Sarah Scoles, These New Tricks Can Outsmart Deepfake Videos—For Now, Wired (Oct. 17, 2018, 7:00 AM), https://www.wired.com/story/these-new-tricks-can-outsmart-deepfake-videosfor-now [https://perma.cc/3G9B-6JEB]. Professor Lyu identified one of the signs that a human likeness had been artificially created: there was something wrong with the way that the human depictions blinked.48Id. The faces depicted in the deepfakes did not “open and close their eyes at the rates typical of actual humans” because the GAN model simply did not “get blinking” (at least not yet).49Id. Professor Lyu’s paper was a breakthrough in fake video detection by using forensic programs to catch “spontaneous and involuntary physiological activities such as breathing . . . and eye movement, [which] are oftentimes overlooked in the synthesis process of fake videos.”50Li et al., supra note 46. For the time being, Professor Lyu had struck a major victory against deepfake creators.
However, while Professor Lyu’s success certainly challenged the forgers by rooting out the flaws in their product,51See supra notes 47–49 and accompanying text. the victory was nonetheless muted by the very nature of the deepfake process. Not long after publishing the paper, Lyu’s team began to receive anonymous emails that contained deepfake videos whose stars blinked more normally and therefore passed the detection tests his team had created.52Scoles, supra note 47. The creators had incorporated a means of detection that the discriminator algorithm had previously not accounted for strongly enough and provided additional reference points for the algorithm to learn from (for example, pictures and videos of humans with their eyes closed, which were underrepresented in the sample data).53Id. The discriminator then did a better job policing the generative model’s fakes, essentially teaching the generative model how to overcome its prior weaknesses.54Id. The short-lived success of the detection program actually made the forgery mechanism stronger.55The process is analogous to bacteria growing stronger by developing immunity to the antibiotics created to defeat them; each advancement in defeating bacteria produces strains of the bacteria naturally resistant to the antibiotic. How Antibiotic Resistance Happens, Ctrs. for Disease Control & Prevention (Feb. 10, 2020), https://www.cdc.gov/drugresistance/about/how-resistance-happens.html [https://perma.cc/VKS5-7ZZ6]. The result is an “arms race between the creators and the detectors.”56Scoles, supra note 47.
Through a program called Media Forensics (MediFor), the Defense Advanced Research Project Agency has been following the challenge of deepfake emergence since even before the videos’ namesake Reddit user popularized the concept in December 2017.57Media Forensics (MediFor), Def. Advanced Res. Project Agency, https://www.darpa.mil/program/media-forensics [https://perma.cc/NK5A-XXVA]; see also Knight, supra note 44. Among MediFor’s lines of effort is an automated system designed to create an “integrity score” for an image or video, in which the content of the video is compared against a variety of external empirical facts to root out inconsistencies.58Scoles, supra note 47. MediFor is attempting to create an integrity score for videos by layering several test models. Id. One model looks for certain background characteristics, such as background noise that is particular to a certain camera model. Id. The next looks at physical characteristics, such as whether shadows or reflections are consistent with the location of the light source. Id. The last is a “semantic level” model, which compares the video to context that the model knows to be true, such as whether the weather depicted matches the weather report for the date of the scene. Id. MediFor seeks to create prototype systems that can stack these levels into a quantifiable “integrity score.” Id. Efforts such as these will always be chasing the forgers, and their breakthroughs will always provide ammunition to the GAN models.59For additional attempts to overcome this challenge, see Dr. Herb Lin’s article, which suggests the possibility of using digital signatures as a strategy to authenticate digital recordings despite Canon and Nikon’s failed attempts to overcome the technological challenge posed by would-be forgers. Herb Lin, The Danger of Deepfakes: Responding to Bobby Chesney and Danielle Citron, Lawfare (Feb. 27, 2018, 7:00 AM), https://www.lawfareblog.com/danger-deepfakes-responding-bobby-chesney-and-danielle-citron [https://perma.cc/67T7-XPR2]. Every data point that gives up a video as fake (such as weather reports to cross-reference against the scene or incorrectly angled shadows that are incongruent with the position of the sun) is a source that deepfake creators can account for by tapping into those data streams for future videos.
C. Fake Video’s Significant Psychological Effects on Viewers
Fraudulent evidence has always been a concern for courtroom integrity. Yet deepfakes raise an even greater level of concern due not only to their ability to seem real, but also to their impact on viewers. The threat that deepfakes pose to courtroom factfinding is not solely due to the high-fidelity human likenesses that are difficult to detect. The nature of viewing video elucidates psychological effects in which people actually believe that they remember things that they did not actually perceive.60Hadley Leggett, Fake Video Can Convince Witnesses to Give False Testimony, Wired (Sept. 14, 2009, 6:02 PM), https://www.wired.com/2009/09/falsetestimony [https://perma.cc/M88G-8TKJ]. This combination is extremely dangerous to witness reliability.
Although many conceive of human memories as an internal video playback system, various studies have shown critical vulnerabilities in our ability to recall memories accurately.61See Mark W. Bennett, Unspringing the Witness Memory and Demeanor Trap: What Every Judge and Juror Needs to Know About Cognitive Psychology and Witness Credibility, 64 Am. U. L. Rev. 1331, 1335–37, 1352 (2015) (examining a host of challenges to accurate witness testimony and proposing a “Model Plain English Witness Credibility Instruction”). Memory is more comparable to “putting puzzle pieces together than retrieving a video recording,”62Hal Arkowitz & Scott O. Lilienfeld, Why Science Tells Us Not to Rely on Eyewitness Accounts, Sci. Am. (Jan. 1, 2010), https://www.scientificamerican.com/article/do-the-eyes-have-it (quoting psychologist and memory researcher Professor Elizabeth F. Loftus). and is therefore subject to a range of “potential mischief” from both internal and external sources.63Bennett, supra note 61, at 1336. There are a variety of psychological limitations on the accuracy of human memory; the most relevant to deepfakes is “suggestibility.”64See id. at 1342–44 (citing Daniel L. Schacter, The Seven Sins of Memory: How the Mind Forgets and Remembers 4 (2001)) (dividing the malfunctions of memory into seven categories: transience, absent-mindedness, blocking, misattribution, bias, persistence, and, most relevant here, suggestibility). Suggestibility is a phenomenon that causes a person to implant memories as a result of leading questions, narratives, or visuals when attempting to recall a past experience.65Schacter, supra note 64, at 5. Due to suggestibility, reconstruction of an experience in the context of prepared materials or leading questions intended to help tell a desired narrative “can cause the witness’[s] memory to change by unconsciously blending the actual fragments of memory of the event with information provided during the memory retrieval process.”66See Richard S. Schmechel et al., Beyond the Ken? Testing Jurors’ Understanding of Eyewitness Reliability Evidence, 46 Jurimetrics 177, 195 (2006) (presenting an independent study of the ability of potential jurors in the District of Columbia to understand limitations on the reliability of eyewitness identification under various strenuous circumstances).
Video exacerbates suggestibility’s effect on memory. In 2010, researchers at the University of Warwick conducted a study illustrating the psychological effect that video has on reconstructing personal observations.67Kimberly A. Wade, Sarah L. Green & Robert A. Nash, Can Fabricated Evidence Induce False Eyewitness Testimony?, 24 Applied Cognitive Psychol. 899, 900 (2010). The researchers placed sixty college students in a room to engage in a computerized gambling task.68Id. at 901–02. Following completion of the task, researchers individually showed each subject digitally altered video depicting a co-subject cheating, when in fact none of the subjects had actually cheated.69Id. at 903–04. Nearly half of the subjects were willing to testify that they had personally witnessed a co-subject cheating after seeing the fake video; only one in ten was willing to testify to the same effect after the researcher merely told the subject about the cheating, rather than showing the fake video evidence.70Leggett, supra note 60. “[R]esearchers emphasized that no one should testify unless they were 100 percent sure they had seen their partner cheat.” Id.
Consequently, deepfakes can have a devastating effect on courtroom integrity. If a party submits a deepfake video to the court, its deceptive harm is not limited solely to the video itself. The lies embedded within a fake video cascade into other portions of the proceedings; viewing fake videos is likely to affect the testimony of witnesses concerning their recollection of events.71Wade et al., supra note 67, at 901, 904. The legal standard to admit video evidence into a courtroom for a jury to see is unfortunately ill-equipped to address this level of risk.
II. Authenticating Photographic and Video Evidence
While technology generally outpaces the law, it is imperative to discern whether the contemporary legal framework is sufficient to address the potential harm that technological advances present. Some scholars and commentators have grappled with the interplay of deepfakes with privacy law, First Amendment rights, and regulatory challenges.72See, e.g., Daniel de Zayas, Legal Means to Prosecute Actors Behind Deepfakes, Am. U. Nat’l Security L. Brief (Sept. 23, 2019), https://nationalsecuritylawbrief.com/ 2019/09/23/legal-means-to-prosecute-actors-behind-deepfakes [https://perma.cc/P89N-CFAD] (identifying pre-existing legislation to prosecute creators and distributors of deepfakes); Harris, supra note 11, at 107, 110–11 (examining the insufficiency of nonconsensual pornography laws to address the deepfake crisis); Spivak, supra note 11, at 358–62 (contrasting First Amendment jurisprudence on obscenities and child pornography with how a court would likely rule on deepfakes). Additionally, deepfakes bring the possibility of unprecedented levels of distrust in the government and other public institutions if videos emerge featuring public figures saying or doing things that never happened.73See Bobby Chesney & Danielle Citron, Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security, 107 Calif. L. Rev. 1753, 1779 (2019) (illustrating how deepfakes may be used to harm society and cautioning that public institutions in which the public’s trust may be eroded by deepfakes “includ[e] elected officials, appointed officials, judges, juries, legislators, staffers, and agencies”). Among the challenges specific to trust in public institutions is that which courtrooms will face in light of the current standards used to admit digital photography and video as evidence.
Common law standards initially governed the admissibility of photographic and video evidence; the McKeever test, originally a standard for admitting audio recordings, stood as a model for admissibility for decades.74See infra notes 80–83 and accompanying text. The McKeever test began as a strict standard, but it eventually became more flexible as photographic and video evidence became more common in courtrooms.75See infra notes 80–86 and accompanying text. The McKeever test later gave way to the Federal Rules of Evidence, which codified the test’s main components.76See infra notes 87–88 and accompanying text. As states codified their own evidence standards based on the Federal Rules of Evidence, courts began to use two theories—the pictorial communication theory and the silent witness theory—to authenticate photographic and video evidence under Rule 901(b).77See infra Part II.B.1–2. This Part discusses the history of the standard of admissibility of photographic and video evidence, two common theories under which courts admit such evidence, and the guide that Federal Rule of Evidence 602 provides for authenticating such evidence.
A. The Evolution of Photographic and Video Evidence Authentication
Suspicion of the susceptibility of photographic and video evidence78The Federal Rules of Evidence define a photograph as “a photographic image or its equivalent stored in any form.” Fed. R. Evid. 1001(c). Video is treated largely similarly to digital and traditional photography for authentication. See, e.g., Linde v. Arab Bank, PLC, 97 F. Supp. 3d 287, 338 (E.D.N.Y. 2015) (authenticating videos “on the same principles as still photographs”), vacated on other grounds, 882 F.3d 314 (2d Cir. 2018). For the purposes of this Comment, photographic evidence generally refers to video evidence as well. to modification or tampering is nothing new to courtrooms; courts have articulated their concerns over photographs and motion pictures since the invention of photography, and such concern continued as photography became more prevalent in society.79See, e.g., Cowley v. People, 83 N.Y. 464, 478 (1881) (“The portrait and the photograph may err, and so may the witness. That is an infirmity to which all human testimony is lamentably liable.”); Gibson v. Gunn, 202 N.Y.S. 19, 20 (App. Div. 1923) (per curiam) (commenting that “moving pictures present a fertile field for exaggeration of any emotion or action” while separately considering the manipulative effect of its lack of relevance). The modern standard for video authentication prior to admission initially mirrored the strict standards that courts used for sound recordings.80Jill Witkowski, Note, Can Juries Really Believe What They See? New Foundational Requirements for the Authentication of Digital Images, 10 Wash. U. J.L. & Pol’y 267, 279 (2002). For decades, courts used the seven-part McKeever test81United States v. McKeever, 169 F. Supp. 426, 430 (S.D.N.Y. 1958) (requiring that the proponent show: “(1) That the recording device was capable of taking the conversation now offered in evidence. (2) That the operator of the device was competent to operate the device. (3) That the recording is authentic and correct. (4) That changes, additions or deletions have not been made in the recording. (5) That the recording has been preserved in a manner that is shown to the court. (6) That the speakers are identified. (7) That the conversation elicited was made voluntarily and in good faith, without any kind of inducement”), rev’d on other grounds, 271 F.2d 669 (2d Cir. 1959). as the standard to admit sound recordings as evidence.82Witkowski, supra note 80, at 276–77. The McKeever test required the proponent to establish authenticity based on seven elements at a hearing prior to admission and was eventually expanded to include video evidence.83Id. at 279 (citing McKeever, 169 F. Supp. at 374–75).
As photographs, motion pictures, and recordings became more familiar and common in daily life, their use in court expanded.84See 2 Kenneth S. Broun et al., McCormick on Evidence § 215 (7th ed. 2013) (describing the different ways that photographs are used in courts). “As judges, counsel and the lay public have become accustomed to the prevalence of such recordings in court, their persuasive potential is both widely acknowledged and the subject of concern.” Id. § 216, at 35. Accordingly, courts loosened the McKeever test over time and eventually set it aside in favor of more lenient standards.85Witkowski, supra note 80, at 279 (“Over time, however, the courts replaced the strict foundational requirements concerning the process of taking motion pictures with the admission of witness testimony that the film was a fair and accurate representation of what actually happened.”); see also Edward J. Imwinkelried, Evidentiary Foundations § 4.09 (9th ed. 2015) (stating that although “the courts were initially very conservative in their treatment of motion pictures,” “[t]he law governing the admission of motion pictures has been liberalized in recent years”). Interpreting the McKeever test as “a guide rather than a rule,” and adopting more relaxed tests, courts determined that trial judges should have “wide latitude” to determine whether a proponent of recordings had laid a sufficient foundation for a reasonable jury to conclude that it was authentic.86Witkowski, supra note 80, at 278; see also United States v. Branch, 970 F.2d 1368, 1371–72 (4th Cir. 1992) (finding the McKeever factors sufficient but not required to establish a foundation for authenticity); United States v. Biggins, 551 F.2d 64, 66–67 (5th Cir. 1977) (holding that the court “neither adopt[ed] nor reject[ed] [the McKeever test] as a whole” and looking to four factors as a guideline that is not intended to “sacrifice [evidence] to a formalistic adherence to the standard [the court] establish[ed]”).
The authentication standard eventually transitioned from the common law to codification after Congress passed the Federal Rules of Evidence in 1975 after decades of study, delay, and deliberation.87An Act to Establish Rules of Evidence for Certain Courts and Proceedings, Pub. L. No. 93-595, 88 Stat. 1926 (1975) (codified as amended at 28 U.S.C. §§ 2072–2074 (2018)). The Judicial Conference responsible for implementing the Rules Enabling Act of 1934 did not formally study a uniform evidence code until 1961 and finally submitted its proposed rules to Congress for approval in 1972. Paul R. Rice & Neals-Erik William Delker, A Short History of Too Little Consequence, 191 F.R.D. 678, 682–84 (2000). The rules reflected the standards for admissibility of videos that courts had adopted since relaxing the McKeever test: relevance (codified in Rule 401), probative value balanced against undue prejudice (codified in Rule 403), and accuracy (codified in the sufficient to support a finding standard in Rule 901).88Witkowski, supra note 80, at 279–80; see Fed. R. Evid. 401, 403, 901. Forty-two states have adopted the Uniform Rules of Evidence (based on the Federal Rules of Evidence).89Gregory P. Joseph, Modern Visual Evidence § 1.02 (2005) (explaining that “[e]ven in states without codification, the courts frequently look to the Federal Rules for guidance, occasionally going so far as to adopt particular rules as a matter of decisional law. The Federal Rules of Evidence have thus come to set the standard of evidence law nationally, in the state as well as the federal courts”).
The authenticity of evidence is ultimately a factual determination for the trier of fact (typically, but not necessarily, a jury) to evaluate.90United States v. Branch, 970 F.2d 1368, 1370 (4th Cir. 1992) (citing Fed. R. Evid. 104 advisory committee’s note to subdivision (b) (“If the evidence is not such as to allow a finding [that a jury could reasonably conclude authenticity], the judge withdraws the matter from their consideration.”)). However, before a court admits evidence for the jury to consider, the court “must determine whether its proponent has offered a satisfactory foundation from which the jury could reasonably find that the evidence is authentic.”91Id. See generally Imwinkelried, supra note 85, § 4.01 (outlining the procedure for authentication under Rule 901). The process by which a judge addresses proper foundation for authentication does not itself establish evidence as authentic; the jury is still responsible for the ultimate determination of authenticity and therefore credibility.92Branch, 970 F.2d at 1370–71.
Rule 901(a) states that to establish a proper foundation for authentication evidence, “the proponent must produce evidence sufficient to support a finding that the item is what the proponent claims it is.”93Fed. R. Evid. 901(a). While Rule 901(a) is not particularly specific in its mandate, Rule 901(b) provides a variety of means through which a party can satisfy Rule 901(a), such as nonexpert opinions about handwriting or evidence derived from public records.94Fed. R. Evid. 901(b)(2), 901(b)(7). Rule 901(b), however, is not exhaustive; there are other means of satisfying Rule 901(a)’s sufficient evidence standard, such as through circumstantial evidence that provides indicia of authenticity.95Fed. R. Evid. 901 advisory committee’s note to subdivision (b) (“The examples are not intended as an exclusive enumeration of allowable methods but are meant to guide and suggest, leaving room for growth and development in this area of law.”); Paul R. Rice & Roy A. Katriel, Evidence: Common Law and Federal Rules of Evidence § 7.02[A][a] (5th ed. 2005) (“[T]here are no inherent limitations on the means by which one can circumstantially authenticate a piece of evidence.”); see infra Part III.C (recommending requiring the use of means other than Rule 901(b)(1) to establish a foundation for video evidence).
B. Theories of Authenticating Photographic and Video Evidence
In alignment with Rule 901(b)’s various means of authenticating evidence, courts typically admit photographic evidence under one of two theories: the “pictorial communication” theory and the “silent witness” theory.962 Broun et al., supra note 84, § 215. Each theory utilizes a different sub-section of Rule 901(b) to meet Rule 901(a)’s sufficient evidence standard for authentication.97Id.; see infra Parts II.B.1–2.
The logic behind distinct foundational standards for the pictorial communication theory and silent witness theory hinges on the intended purpose of substantive as opposed to illustrative evidence. Substantive evidence provides an “independent probative value for proving a fact,” such as a physical object recovered from a scene relevant to the case.982 Broun et al., supra note 84, § 212. Illustrative evidence, on the other hand, accompanies witness testimony and is intended to “aid the trier [of fact] in understanding the witness’s testimony.”99Id. The distinction is important but problematic in the context of photographs and videos because illustrative evidence often becomes substantive by showing the jury more than the witness can recollect or convey, thereby introducing independent, substantive evidence for which there is no foundation.100Id.; see infra Part III.A (arguing that the vast substantive detail that video conveys to a jury exceeds the illustrative intent of the pictorial communication theory). Nonetheless, the pictorial and silent witness theories derive their separate standards from the supposition that illustrative evidence is limited to the perceptions and recollections of the witness’s testimony.1012 Broun et al., supra note 84, § 215.
1. The pictorial communication theory
Courts most commonly admit photographic evidence as illustrative evidence, intended to accompany a witness’s testimony.102Id. This application of photographic evidence is known as the pictorial communication theory, in which photographic evidence is intended to be viewed “merely as a graphic portrayal” to supplement a witness’s oral testimony.103Id. Under the pictorial communication theory, the typical means of establishing a foundation for authentication is Rule 901(b)(1), which provides that a “[w]itness with [k]nowledge” testify that an item is what it is claimed to be.104Fed. R. Evid. 901(b)(1).
Rule 901(b)(1)’s method for establishing an evidentiary foundation is nearly as vague as Rule 901(a)’s standard that it seeks to meet. Applying Rule 901(b)(1), a proponent establishes a foundation for photographic evidence if a witness testifies that the photograph is a “correct and accurate representation of relevant facts personally observed by the witness.”1052 Broun et al., supra note 84, § 215. Courts commonly refer to this rule as the “fair and accurate portrayal” standard.106Id. The fair and accurate portrayal standard was a common law standard prior to the adoption of the Federal Rules of Evidence, at which point it was incorporated into applying Rule 901(b)(1) to photographic evidence. Id. “Rule 901 is little more than a delineation of the methods of authentication that courts recognized under the common law.” Rice & Katriel, supra note 95, § 7.02[B][a]; see, e.g., Kooyumjian v. Stevens, 135 N.E.2d 146, 151 (Ill. App. Ct. 1956) (applying the common law principle of fair and accurate portrayal prior to the adoption of the Federal Rules of Evidence).
The fair and accurate portrayal standard assumes that video is difficult to alter—the standard is rooted in an age of traditional film photography, prior to the advent of digital photography and other media.107Witkowski, supra note 80, at 282 & n.65. Traditional photography differs from digital media (whether still photography or video) in several ways.108See id. at 269–71 (outlining the digital image creation process in scientific detail concerning image compression and physical characteristics). The most relevant difference is that digital media stores individual pixels as data in an electronic file; there is no traditional original image that exists with, for example, older thirty-five millimeter film cameras.109Id. at 272–73. Traditional film cameras capture light data as imprinted onto physical film, which can then be protected through a secure chain of custody.110Id. at 268 n.3, 272. Digital photography, however, as a “finite set of ones and zeroes,” makes determining whether a digital photograph is an original or a copy nearly impossible.111Id. at 272. But see John M. Facciola & Lindsey Barrett, Law of the Foal: Careful Steps Towards Digital Competence in Proposed Rules 902(13) and 902(14), 1 Geo. L. Tech. Rev. 6, 11–12 (2016) (explaining how iPhone software captures the date, time, and GPS coordinates of pictures as metadata while subsequently acknowledging the possibility that it could be altered); Lin, supra note 59 (suggesting the possibility of “digital signatures” to ensure image security).
Additionally, because early digital photography featured lower initial image quality compared to film photography, its proponents commonly needed to enhance digital photographs to aid the trier of fact.112Witkowski, supra note 80, at 269 n.6, 271 n.16 (citing Herb Blitzer, Creating the Digital Image SOP, L. Enforcement Tech. 58–61 (June 2000), http://desksgt.com/ Classes/Reading/digitalimagesop.pdf [https://perma.cc/S29H-EK9V]). “In general, both traditional photographs and digital images often need to be enhanced. Enhancing an image involves adjusting the contrast so that the picture is clearer.” Witkowski, supra note 80, at 271 n.17. Thus, an abundance of cases have addressed the issue of non-insidious modifications of video, such as editing, enhancing, taping over, or curating certain portions of a longer video or recording.113See, e.g., United States v. Seifert, 445 F.3d 1043, 1045–46 (8th Cir. 2006) (admitting digitally enhanced surveillance tape after expert video analyst’s testimony about each step of the digital enhancement process); United States v. Mills, 194 F.3d 1108, 1111–12 (10th Cir. 1999) (admitting an incomplete videotape where an officer responsible for filming testified as to authenticity of the tape and confirmed that, “except for the deleted portion, it accurately depicted the entire episode”). In these commonplace instances, courts have required no more than satisfaction of the fair and accurate portrayal standard—or the “evidence as a process or system” standard if admitted under the silent witness theory114See infra Part II.B.2 (addressing the more demanding requirement for the silent witness theory).—to admit the recording.115Fed. R. Evid. 901(b)(9). For example, the Supreme Court of Arkansas drew a careful distinction between video that had been “enhanced” by adjusting the brightness and contrast of the video with that which was “altered,” such as by changing the “face, features, or physique of someone not present in the original videotape.”116Nooner v. State, 907 S.W.2d 677, 686 (Ark. 1995); see also Louis Vuitton S.A. v. Spencer Handbags Corp., 765 F.2d 966, 973–74 (2d Cir. 1985) (finding sufficient authentication in the absence of any accusations of inaccuracy or tampering). The court dismissed the defendant’s contention that the video had been manipulated by stating that the jury had ample opportunity to determine whether any alterations were present.117Nooner, 907 S.W.2d at 686. In these types of cases, courts address both whether the alteration process distorted the image such that the resulting product remains authentic as well as whether the curation conveys a message so different from the original that it is no longer “relevant” under Rule 403.1182 Broun et al., supra note 84, § 215. For both issues, courts envision having the “original” recording to reference against;119Witkowski, supra note 80, at 272. courts rarely consider the possibility of outright forgery when considering authentication standards for admission.120Id. at 285–86 (considering various reasons for the “infrequency of challenges to digital images,” including general lack of awareness and a focus on editing, not forgery); see infra notes 201–05 and accompanying text. The rare cases when courts reject photographic evidence are when there is no authenticating witness or the witness expressly rejects the photograph as an accurate depiction.121See, e.g., United States v. Lawson, 494 F.3d 1046, 1052 (D.C. Cir. 2007) (determining that the trial court properly excluded photographs from evidence because they were not authenticated by the only witness familiar with the scene). This was the case in United States v. Lawson,122494 F.3d 1046 (D.C. Cir. 2007). where the defendant offered photographs that were excluded from evidence because the only witness at trial testified that the photographs “did not accurately reflect what he saw.”123Id. at 1052.
Because of this traditional framework, the fair and accurate portrayal standard is not a difficult hurdle to clear. A witness who testifies as to a photograph’s or video’s accuracy does not need to be the actual photographer or understand the process by which the originator created it.124See, e.g., Kooyumjian v. Stevens, 135 N.E.2d 146, 151 (Ill. App. Ct. 1956) (admitting photographs when the authenticating witness did not know when the pictures were taken); State v. Pearson, 975 So. 2d 646, 655 (La. Ct. App. 2007) (finding proper establishment of foundation even though photographer did not testify). The standard to establish a foundation is so minimal that issues concerning the possibility that the witness’s fair and accurate testimony is “limited” or “defective” or that the witness is “otherwise unsure of his perceptions” are matters saved for the jury, to which the jury must assign weight to evaluate the evidence’s credibility—not matters of admissibility with which the proponent of the evidence must grapple.12531 Charles Alan Wright & Arthur R. Miller, Federal Practice and Procedure, § 7106 (1st ed.), Westlaw (database updated Apr. 2020). Instead, the standard imposes only a “sufficient to support a finding” requirement on the proponent.126Id.
2. The silent witness theory
In addition to the pictorial communication theory, a party may also submit a photograph or video as substantive evidence—that is, the photograph or video is capable of standing on its own to convey what it depicts and, in turn, obviates the need for a witness.127See 2 Broun et al., supra note 84, § 216 (“[I]t is important for courts to acknowledge that films and videos are often not merely illustrative of a witness’s testimony, but are potential independent sources of substantive information for the trier of fact.”); see also supra note 89 (discussing similar authentication treatment for photographs and videos). Courts admit photographic evidence in this manner under the silent witness theory.1282 Broun et al., supra note 84, § 216. By treating evidence as a “potential independent source of substantive information for the trier of fact,” the silent witness theory has stricter requirements for the admission of photographic evidence than the pictorial communication theory’s requirements for admission.129Id. Evidence admitted under the silent witness theory is generally subject to Rule 901(b)(9), which allows a proponent of evidence to establish a foundation for authentication by “describing a process or system and showing that it produces an accurate result.”130Fed. R. Evid. 901(b)(9).
One of the most common examples of evidence admitted under the silent witness theory is security camera footage. Typically, when a party submits video from a closed-circuit television (CCTV) device at, for example, a bank or convenience store, a worker or expert will testify as to the reliability of the video and the process for maintaining an accurate system.1312 Broun et al., supra note 84, § 216. For example, in United States v. Rembert,132863 F.2d 1023 (D.C. Cir. 1988). the government offered no witnesses to testify that a CCTV video fairly and accurately depicted the scene; instead, a bank employee testified as to how the cameras were loaded, how the results were secured, and the internal metadata concerning the date and location of the filming.133Id. at 1028 (rejecting a heightened standard of evidentiary authentication in criminal proceedings). Courts commonly accept details of this nature when the cameras are part of a regulated system that is maintained and operated according to accepted standards, such as those of a police department or bank security system.1342 Broun et al., supra note 84, § 215.
Because evidence admitted under the silent witness theory may stand alone as substantive evidence without accompanying witness testimony, courts generally only admit it when the device and process are set up and executed in a controlled environment. Courts have accepted testimony concerning the process and system as it applies to CCTV surveillance videos as described above, as well as x-ray photography and police footage.135See, e.g., United States v. Stephens, 202 F. Supp. 2d 1361, 1368 (N.D. Ga. 2002) (admitting surveillance video under the silent witness theory after police official testified as to the process and “general reliability of the entire system”); Woodward v. State, 123 So. 3d 989, 1027 (Ala. Crim. App. 2011) (applying the silent witness theory and upholding the validity of video footage from a patrol car as a sufficiently reliable mechanism capable of accurately recording a criminal shooting); People v. Bowley, 382 P.2d 591, 595 (Cal. 1963) (holding x-ray imaging admissible under the silent witness theory since no one can testify to the accuracy of an image, because it is not possible to directly observe the inside of a body). However, digital photography that the general public personally creates falls largely beyond this threshold because it lacks a systematic and reliable scientific process and because the proponent cannot demonstrate a secure chain of custody.1362 Broun et al., supra note 84, § 215. For example, even though surveillance or police footage is digitally created, the chain of custody (generally secured through police channels) insulates the product from tampering, and therefore the footage may potentially stand on its own as substantive in ways that evidence admitted under the pictorial communication model theoretically could not.137Id.
Over the past several decades, courts have begun to test the digital boundaries of the silent witness theory. For example, in an instance where a police officer took a cell phone video recording of a CCTV surveillance video system of a convenience store, the state failed to establish a foundation when the police officer testified that his video was a fair and accurate portrayal of what the CCTV depicted.138State v. Moore, 803 S.E.2d 196, 210 (N.C. Ct. App. 2017). The officer’s fair and accurate portrayal testimony was insufficient where he could only speak to his knowledge of the depiction of his cell phone tape; in other words, the officer had no more personal knowledge that the video of the scene of the crime was a fair and accurate portrayal than anyone else.139Id. (“No witness was asked whether the video accurately depicted events that he had observed, and no testimony was offered on the subject.”). Without Rule 901(b)(9) evidence concerning the reliability of the CCTV itself, the recording was inadmissible.140Id. Cell phone videos present particularly unique challenges in the silent witness theory context because of the lack of reliability concerning the process and preparation of such videos. Courts have distinguished video recordings originating from cameras worn by an undercover police officer and prepared by state officials from videos taken by an undercover officer with a cell phone in otherwise the same context.141McFall v. State, 71 N.E.3d 383, 388 (Ind. Ct. App. 2017). In McFall v. State,14271 N.E.3d 383 (Ind. Ct. App. 2017). the Court of Appeals of Indiana addressed this very issue when the prosecution introduced evidence of a controlled drug buy using video from a confidential informant’s cell phone.143Id. at 388–89 (rejecting the trial court’s admission of the confidential informant’s cell phone footage under the silent witness theory but ultimately rendering the error harmless because the defendant “identified herself in the videos . . . and acknowledged that the events depicted in them occurred on [the date in question]”). Whereas normally police officers equip an informant with government owned and managed recording equipment and secure it from the informant following an operation, here the detective did not exercise control over the informant’s cell phone and filming process throughout the operation.144Id. at 388. The prosecution therefore could not attest to the accuracy of a process or system under Rule 901(b)(9) because the informant’s personal phone was not subject to the same standard operating procedures and chain of custody that the police use for typical surveillance equipment.145Id.; Fed. R. Evid. 901(b)(9). However, had the government presented an authentication witness with personal knowledge of the depiction itself, the court could have admitted the video under Rule 901(b)(1). See United States v. Richardson, 562 F.2d 476, 479 (7th Cir. 1977) (admitting bank surveillance film despite the prosecution’s inability to meet the Rule 901(b)(9) standard due to lack of secure chain of custody because eyewitnesses testified to the fair and accurate standard under Rule 901(b)(1)).
These cases demonstrate courts’ acknowledgement of the risk that digital photography poses and their hesitance to incorporate it into the silent witness theory without an authenticating witness. Despite these risks, courts have refused to incorporate any changes to the pictorial communication standard when it comes to digital photography.146See, e.g., Owens v. State, 214 S.W.3d 849, 854 (Ark. 2005) (“[W]e do not agree that this court should impose a higher burden of proof for the admissibility of digital photographs merely because digital images are easier to manipulate.”).
C. Rule 602 Caselaw Establishes a Baseline for Distinguishing Personal Knowledge from Speculation and Logically Applies to Rule 901(b)(1) Witnesses
Normally, a judge will not exclude an eyewitness if her memory or perception is limited; as long as the testimony could assist a reasonable trier of fact in establishing the facts, the court will allow the witness to testify.147See supra notes 125–26 and accompanying text. However, a judge has discretion to exclude evidence (prior to its admission) when a witness’s personal knowledge is particularly uncertain or unreliable or when there is not enough evidence that a reasonable juror could give some weight to the testimony.14827 Charles Alan Wright & Arthur R. Miller, Federal Practice and Procedure, § 6027 (2d ed.), Westlaw (database updated Apr. 2020). For example, in Nolin v. Douglas County,149903 F.2d 1546 (11th Cir. 1990). the judge did not admit a document when the witness stated that he was only “somewhat familiar with the document.”150Id. at 1552. Rule 901(b)(1) is not specific to video, and in this case the witness’s uncertainty applies more broadly to a knowledgeable witness rather than the “fair and accurate” standard for photographs. See, e.g., United States v. Crute, 238 F. App’x 903, 905–06 (3d Cir. 2007) (authenticating vehicle registration records through a knowledgeable witness); Kruse v. Hawai’i, 857 F. Supp. 741, 745–46 n.5 (D. Haw. 1994) (authenticating hospital records), aff’d, 68 F.3d 331 (9th Cir. 1995). Thus, judges must walk a fine line between the minimum amount of personal knowledge required to testify and imperfect knowledge that crosses the threshold into speculation.
This fine line determines whether a witness has the requisite personal knowledge to testify to the fair and accurate portrayal standard to establish a foundation of authenticity under Rule 901(b)(1). Since Rule 901(b)(1) does not specifically define knowledge, other sections of the Federal Rules of Evidence are instructive.151See 31 Wright & Miller, supra note 125, § 7106 (“The fact that Rule 901(b)(1) uses the word ‘knowledge’ without restrictions or modifiers suggests that authentication testimony may be based on knowledge of the sort described by either Rule 602 or Rule 702.”). The most relevant section in this context is Rule 602, which requires witnesses to have personal knowledge of the matters about which they testify. Rule 702 allows expert testimony based on “scientific, technical, or other specialized knowledge;”152Fed. R. Evid. 702. because most witnesses with potential fair and accurate portrayal testimony will not have such expertise, Rule 602’s personal knowledge requirement is a more appropriate standard for knowledge than Rule 702 in this context.
Rule 602 requires that a witness have personal knowledge of the matter about which she is testifying for the testimony to be relevant.153Fed. R. Evid. 602. The witness must demonstrate personal knowledge on the matter by a preponderance of the evidence under Rule 104(a). Miller v. Keating, 754 F.2d 507, 511 (3d Cir. 1985). Because other subdivisions of Rule 901(b) describe means of authentication based on either personal or specialized knowledge,15431 Wright & Miller, supra note 125, § 7106 (referring to Fed. R. Evid. 901(b)(5) (stating that voice may be identified based on a witness hearing it “firsthand”) and Fed. R. Evid. 901(b)(3) (explaining authentication through expert’s comparison with specimen authenticated by another)). Rule 602 and its associated caselaw applies to Rule 901(b)(1) by logical extension despite the lack of a definition of knowledge in Rule 901(b)(1) itself. Thus, examining the Rule 602 standard for personal knowledge helps articulate the requirement for whether a witness testifying to the fair and accurate portrayal standard has the requisite personal knowledge for Rule 901(b)(1). The Rule 602 standard helps define the line between personal knowledge shortcomings that pass the foundational requirements for a jury to consider and those that the court rejects at the foundational stage as speculative, as was the case in Nolin.155Nolin v. Douglas Cty., 903 F.2d 1546, 1552 (11th Cir. 1990).
In applying Rule 602 for determining personal knowledge, courts have long resisted refusing to allow a witness to testify merely because the court believes the witness to be obviously mistaken or dishonest.156Edmund M. Morgan, Basic Problems of State and Federal Evidence 53–54 (Jack B. Weinstein, 5th ed. 1976). The only appropriate circumstance for a court to reject a witness’s testimony is when no reasonable trier of fact could believe that a witness perceived what she claims.157Id. Courts’ inclination is for the jury, as the trier of fact, to assign weight to testimony in accordance with its perception of the witness’s reliability and other factors to aid in its judgment.158See 27 Wright & Miller, supra note 148, § 6027 (summarizing the threshold for satisfying Rule 602 by noting that “[t]he judge should allow the testimony to go to the jury unless the judge concludes the foundation for personal knowledge is so weak that the testimony will be a waste of time”). Personal knowledge of objects or events under Rule 602 is comprised of four elements: “(1) sensory perception; (2) comprehension about what was perceived; (3) present recollection; and (4) the ability to testify about what was perceived.”159Keiser v. Borough of Carlisle, No. 1:15-CV-450, 2017 WL 4075057, at *5 (M.D. Pa. Sept. 14, 2017); see also 2 John Henry Wigmore, Evidence in Trials at Common Law § 478 (James H. Chadbourn ed., 1979) (generally outlining observation/perception, recollection, and communication as requirements for testimonial assertions). Each of these four elements is required for a judge to allow a jury to hear a witness’s testimony.160Keiser, 2017 WL 4075057, at *5.
The first requirement for personal knowledge under Rule 602 is sensory perception, which courts commonly label “observation.”16127 Wright & Miller, supra note 148, § 6023. Although this shorthand most immediately invokes sight, sensory perception may be based on any of the five senses.162See Fox v. Order of United Commercial Travelers of Am., 192 F.2d 844, 846 (5th Cir. 1951) (“A witness may testify to what he hears, feels, tastes, and smells, as well as to what he sees, and regardless of whether he sees anything.”). To satisfy the sensory perception element, the witness must have the ability to perceive and must in fact have perceived what she is testifying to; the witness’s ability, however, may be limited, or even minimal.163See, e.g., Auerbach v. United States, 136 F.2d 882, 885 (6th Cir. 1943) (witness identified defendant’s voice “to the best of his belief” and acknowledged that “it was possible that he could be mistaken”). Courts have long recognized that the personal knowledge standard to admit a witness’s testimony does not require positive or absolute certainty.164See, e.g., United States v. Hickey, 917 F.2d 901, 904–05 (6th Cir. 1990) (“Despite the fact that . . . [the witness’s] perception was sometimes impaired, a reasonable or rational juror could believe that [the witness] . . . perceived the course of events to which [he] testified.”); United States v. Evans, 484 F.2d 1178, 1181 (2d Cir. 1973) (applying personal knowledge standards to the competency of a witness to make an in-court identification).
For a court to exclude a witness for lack of sensory perception, the witness must have not been able to perceive relevant facts directly. For example, in State v. Tutt,165622 A.2d 459 (R.I. 1993). when “it was dark, [and the witness] could n[o]t make out exactly what was happening,” the court precluded the witness from testifying because of an inability to visually perceive what she purported to testify to.166Id. at 462. Similarly, in McCrary-El v. Shaw,167992 F.2d 809 (8th Cir. 1993). the Eighth Circuit affirmed the trial court’s exclusion of the deposition of a witness who claimed to have seen a confrontation between the defendant and several correctional officers from an adjoining jail cell.168Id. at 811. The court reviewed a diagram of the jail layout and found that no reasonable person could conclude that the witness could see anything of relevance.169Id. As these cases demonstrate, the personal knowledge standard allows a witness’s limitations and gaps in perception but not a complete inability to perceive.17027 Wright & Miller, supra note 148, § 6023.
The second element of personal knowledge is recollection, which, like sensory perception, does not need to be perfect to satisfy the test. Of course, no human memory is flawless. Incomplete or limited memory is usually sufficient to satisfy this requirement and is generally a matter to which a trier of fact must assign weight.171Tippens v. Celotex Corp., 805 F.2d 949, 953–54 (11th Cir. 1986). For example, in United States v. Sinclair,172109 F.3d 1527 (10th Cir. 1997). the court admitted the testimony of a drug user despite allegations of a “clouded memory,” relying on its confidence in the jury’s traditional role of determining witness credibility.173Id. at 1537.
There is, however, an important line that a witness crosses with too many memory or perception gaps; eventually, the witness can only convey the testimony coherently by filling the gaps with hearsay or speculation.174See 2 Wigmore, supra note 159, § 659 (“[T]he law may reject testimony which appears to be founded on data so scanty that the witness’[s] alleged inferences from them may at once be pronounced . . . extreme.”). Witnesses commonly attach caveats to the accuracy of their memory, such as “I believe,” “to the best of my recollection,” or “I cannot be positive, but I think.”175See Mason Ladd, Expert and Other Opinion Testimony, 40 Minn. L. Rev. 437, 437, 440 (1956) (examining the evolution of Minnesota’s opinion standards for both lay and expert witnesses against the Uniform Code of Evidence prior to the adoption of the Federal Rules of Evidence). The critical threshold, which the trial judge wields tremendous latitude in determining, is where the witness can only convey the narrative of her testimony by filling relevant gaps with speculation.1762 Wigmore, supra note 159, § 659. At this point, it is proper for a judge to exclude the testimony as speculative.177Id. The speculation threshold is similar for the recollection and perception components of personal knowledge. The witness in McCrary-El could not convey a complete narrative without speculation because he could not perceive key elements of the story due to his lack of vantage point from which to observe the relevant events;178McCrary-El v. Shaw, 992 F.2d 809, 811 (8th Cir. 1993). the Sinclair witness, on the other hand, could convey a complete story, even if the opposing party called his ability to recall into question, because he was able to perceive to the subject of his testimony.179United States v. Sinclair, 109 F.3d 1527, 1536–37 (10th Cir. 1997). The key element that distinguishes these cases is whether the ability to perceive or remember is essentially nonexistent or merely limited, distorted, or otherwise imperfect.
Rule 602’s third element is comprehension. Even when a witness perceives an event through direct sensory perception, she must still comprehend what she sees to have personal knowledge to testify on the matter.18027 Wright & Miller, supra note 148, § 6023. Again, a witness’s comprehension does not need to be perfect. For example, a court may admit a child’s testimony, even if she did not fully understand what was happening, so long as the other elements are met.181See Sauer v. Exelon Generation Co., 280 F.R.D. 404, 405, 407 (N.D. Ill. 2012) (refusing to exclude deposition of a child because she had “difficulty . . . remembering, communicating and understanding”). A witness’s comprehension of her perceptions will never be without inference, as a natural degree of inference is always present in human comprehension.18227 Wright & Miller, supra note 148, § 6023. To understand sensory perceptions, a person has no choice but to connect those perceptions to past experiences and draw inferences about what she perceives.183Id.; see also United States v. Joy, 192 F.3d 761, 767 (7th Cir. 1999) (“Because most knowledge is inferential, personal knowledge includes opinions and inferences grounded in observations or other first-hand experiences.”). Humans must use some inferences to make sense of their world, or else testimony would consist only of a “description of the chemical and electrical effects of perception on the witness’[s] brain,” which no witness is consciously aware of. 27 Wright & Miller, supra note 148, § 6023. Ultimately, the judge controls the amount of latitude to grant to a witness by either requiring more literal perceptions or allowing more inferences to describe the events that the witness perceived.184See Visser v. Packer Eng’g Assocs., 924 F.2d 655, 659 (7th Cir. 1991) (stating that personal knowledge includes inferences and some opinions because “all knowledge is inferential”). The balance between pure sensory perception and the inferences required to comprehend what a person perceives can reach esoteric levels beyond the intent of both the personal knowledge standard and this Comment.
The final element is the ability to testify based on the first three components. This is closely related to the third element of comprehension, but refers to the witness’s comprehension at the time of testimony rather than at the time of perception.18527 Wright & Miller, supra note 148, § 6023. For example, when a witness has been hypnotized to refresh her memory or has suffered a brain injury since the event at issue, she may no longer be able to comprehend the line of questioning or her perceptions of the event, even though she understood the event at the time she perceived it.186Id. If she is not able to comprehend at the time of questioning, she cannot satisfy the personal knowledge requirement.187Id.
The personal knowledge standard from Rule 602 direct testimony helps illustrate the knowledge required to meet the knowledge standard of Rule 901(b)(1). Thus, a witness must meet Rule 602’s personal knowledge elements to testify as to whether photographic evidence is a fair and accurate portrayal.188Fed. R. Evid. 602, 901. To have the requisite knowledge, the witness must base her fair and accurate portrayal judgment on the direct use of her own senses, must have comprehended what she perceived at the time as well as at the time of her testimony, and must have a recollection of that prior perception. The witness is, of course, entitled to an imperfect memory as well as limitations in perception.189See supra notes 163–70 and accompanying text.
III. Authenticating Witnesses Can No Longer Reliably Testify to the Fair and Accurate Portrayal Standard to Authenticate Photographic Evidence
Over the past twenty-five years, several scholars have noted the risk that evidentiary standards are too low to address advances in digital photography,190See Witkowski, supra note 80, at 285–87 (arguing in 2002 that the standard to admit digital images was insufficient); see also Sharon Panian, Truth, Lies, and Videotape: Are Current Federal Rules of Evidence Adequate?, 21 Sw. U. L. Rev. 1199, 1205–14 (1992) (highlighting common distortion problems with misleading computer graphics and edited video tapes). but they have made little progress in motivating any changes to the standards.191See, e.g., Owens v. State, 214 S.W.3d 849, 854 (Ark. 2005) (refusing to alter the standard for digital photographs). Two factors have historically mitigated the impact of such a low bar: first, the court could rely on expert witnesses to assist with authenticity determinations, and second, it was still extremely difficult to create high quality fake video. The dawn of the deepfakes era brings this deficiency to the forefront with a new sense of urgency.192See supra Part I (explaining the believability of human likenesses deepfakes and the unique detection challenges that make expert witness authentication more challenging for deepfakes than for other means of fraud). The proliferation of deepfakes technology renders obsolete the assumptions upon which the fair and accurate portrayal test relies; witnesses can no longer meet the fair and accurate portrayal standard within the legal standard of personal knowledge required to authenticate video evidence.
The unworkability of the fair and accurate portrayal standard is born out of a convergence of several factors. Deepfakes vastly increase the likelihood that authenticating witnesses will be unable to identify material changes from the actual scene that the video depicts.193See supra Part I.A (examining the alarming level of precision and realism that programmers using generative adversarial networks can create). Moreover, fake video is more likely to corrupt an authenticating witness’s memories to lead her to actually recall the falsehoods that the video depicts.194See supra Part I.C (exploring the tremendous effect that video has on human recollection by creating suggestive false memories). The authenticating witness’s inability to detect alterations from what she observed and the possibility of false memories leads to a complete inability for the witness to attest to a video as a fair and accurate depiction. The only way to attest that a video is a fair and accurate portrayal is by speculating on vast amounts of detail which, critically, witnesses are likely to believe as their own memory when the court shows them a fake video.195See Wade et al., supra note 67, at 899. When combined with the disconnect inherent in the pictorial communication theory,196See infra Part III.A (arguing that illustrative evidence often conveys substantive effects beyond the scope of the pictorial communication theory). the result is a high probability of the court presenting to a jury fraudulent substantive evidence that has been authenticated by a witness without proper personal knowledge.
A. Muddled Theories: Video Causes Pictorial Communication Evidence to Leech into Substantive Evidence
The standard for admitting photographic evidence without an accompanying witness is far more comprehensive than when a witness is available to testify that the visual is a fair and accurate depiction.197See supra Part II.B.1–2 (comparing the legal standard for the pictorial communication theory and silent witness theory). However, the natural result of society’s familiarization with and trust in photography and video recordings is that illustrative evidence’s impact perpetually bleeds over into substantive effect; scholars have articulated this concern for some time, yet the problem remains.1982 Broun et al., supra note 84, § 215; see also Robert D. Brain & Daniel J. Broderick, The Derivative Relevance of Demonstrative Evidence: Charting Its Proper Evidentiary Status, 25 U.C. Davis L. Rev. 957, 998, 1018 (1992) (examining the evolving effect of demonstrative evidence and proposing modifications to Rule 401 relevance standards). Under the pictorial communication theory, photographic evidence should, strictly speaking, “illustrate the witness’[s] testimony, . . . add[ing] nothing further.”199Jessica M. Silbey, Judges as Film Critics: New Approaches to Filmic Evidence, 37 U. Mich. J.L. Reform 493, 500–03 (2004) (highlighting the “jurisprudential anxieties” inherent in mischaracterizing demonstrative evidence). But this belies the natural human experience of consuming photographic evidence—such evidence conveys more information to the trier of fact than the witness could possibly have seen or heard but also may not have picked up every detail that the witness actually perceived. This dilemma is both technical, in the sense that a photograph is “not a replication but a representation, a constructed—and hence fallible—image,”200See Jennifer L. Mnookin, The Image of Truth: Photographic Evidence and the Power of Analogy, 10 Yale J.L. & Human. 1, 7, 23 (1998) (examining the “kaleidoscopic understandings of the meaning of photographic evidence” and judicial attempts to govern evolving visual technology). and experiential, in that the witness could not possibly recollect every single detail a recording conveys and simultaneously may very well recall information that the recording device did not capture. Courts have acknowledged the risk inherent in “[t]he masking of the substantive effect of photographs under the rubric of ‘illustrative evidence’” as lacking “conceptual honesty.”2012 Broun et al., supra note 84, § 215; see also Joseph, supra note 89, § 5.02[c] (analyzing the terminology separating the pictorial communication and silent witness theory as “unfortunate” because photographic evidence “introduced by means of the fair-and-accurate standard need not be given merely illustrative effect but may be, and often are, entitled to be given substantive effect”). The resulting effect is that photography admitted under the low standard of the pictorial communication theory can easily have the practical effect of substantive evidence as if admitted under the silent witness theory but without meeting the more stringent requirements of Rule 901(b)(9).202Silbey, supra note 199, at 531. “[C]ourts in their rulings admitting or excluding filmic evidence frequently evaluate film as a demonstrative aid only to later marshal the film toward substantive ends.” Id. This occurrence is rooted in the judicial system’s confidence in the reliability of the photographic process, despite the fact that film theory teaches camera operators how to deliberately invoke reactions through a host of techniques.203See id. at 531–32 (noting the judicial system’s confidence in the transparency of film despite “a century of film theory and history teaching the opposite”); see also id. at 548–49 (finding that camera operators make “each spectator feel as if he or she is an eyewitness, despite that impossibility”).
Although courts acknowledge an underlying risk to digital photography and video, the fair and accurate portrayal standard has nonetheless been seemingly immune to reconsideration. The Supreme Court of Arkansas, for example, has recognized the risk that it is easier to manipulate digital, rather than traditional, images yet it refused to impose a higher burden of proof for their admissibility when a defendant challenged the admission of surveillance video under the fair and accurate portrayal standard.204Owens v. State, 214 S.W.3d 849, 854 (Ark. 2005). The lack of evolution of the admissibility standard is in part because challenges to the veracity of digital images are rare.205Witkowski, supra note 80, at 285. This is likely attributable to the legal community’s lack of awareness of the risk inherent in digital images compared to older technology.206Id. at 286. Additionally, when a party does challenge a digital image, the challenge typically addresses an overt enhancement of the image rather than the image’s authenticity.207Id. Moreover, courts may fear that elevating admission standards for photographic evidence will stifle the efforts of law enforcement, whose use of digital equipment during crime scene investigation has become commonplace,208Id. at 286–87, 287 n.87. or will slow the trend towards the convenience that comes with increased computer use in litigation.209Id. at 287.
The consequences of the natural bleed over from illustrative to substantive evidence is that digital photography and video, admitted under the easily satisfied standard of Rule 901(b)(1), tend to convey substantive fact far beyond what the legal standard assumes or intends.
B. Witnesses Can No Longer Meet the Personal Knowledge Standard of Rule 602 to Attest to Photographic Evidence as a Fair and Accurate Depiction of a Scene
Establishing a foundation for admitting photographic evidence under the pictorial communication theory requires witness-with-knowledge testimony that the photograph or video is a fair and accurate depiction of the scene that it illustrates; to attest to this standard, a witness must be able to satisfy Rule 602’s personal knowledge requirement.210See supra Parts II.B.1, II.C. Because witnesses are unable to perceive alterations or fabrications in deepfake videos, they can no longer determine whether the video’s depiction is a fair and accurate portrayal of their memory. Using the personal knowledge standard articulated in Rule 602 caselaw, witnesses will commonly fail the recollection element of personal knowledge that a video is a fair and accurate portrayal.211See supra text accompanying notes 172–73; supra notes 67, 71, 192 and accompanying text. The personal knowledge standard allows for significant gaps in the ability to recollect, but it does not permit gaps so central to the testimony that the testimony crosses the threshold into speculation.212See supra notes 174–79 and accompanying text. Because witnesses cannot possibly recall all of the detail conveyed in a photograph or video, their limitations are likely to go beyond fuzziness or uncertainty and become speculative.
The underlying problems with the fair and accurate standard did not emerge with the invention of deepfakes; these problems have existed ever since photoshop became a commonly used verb.213Concerns over exaggerations or distortions in digital photography have been articulated in response to the emergence of photoshop. Witkowski, supra note 80, at 283–87. The potential for manipulation skyrockets with the ability to realistically create human likenesses in videos, not just still photography. See supra Part I.A (addressing the fidelity of human likenesses in deepfakes). Rather, deepfakes critically reduce the already limited effectiveness of authentication witnesses. Deepfakes exacerbate the inability of witnesses to determine their own recollection limitations and communicate the extent to which their limitations affect their ability to attest to the fair and accurate portrayal standard. Deepfakes’ lifelike fidelity reduces the likelihood that authentication witnesses will reliably rise to the task of stating either that something looks different from the way they remember it or that they do not recall it at all; the visuals are too convincing and too likely to take advantage of the suggestibility flaw inherent in our memories.214See Bennett, supra note 61, at 1335–37 (describing how human memory is imperfect and susceptible to suggestions); Wade et al., supra note 67, at 899 (using fabricated video in a psychological study to demonstrate witness suggestibility). Thus, the speculation that occurs in blanketing the entire depiction as fair and accurate crosses the threshold of acceptable gap filling.215See supra notes 174–79 and accompanying text.
Even a well-intentioned witness with no intention of deceiving the court will be unable to meet the threshold. The following example is illustrative. A criminal defendant offers a video made using a commercial iPhone. It depicts the defendant at an event with a date and location known to the public, such as a concert or other public event, thus providing an alibi. A witness who was at the event may recognize a variety of features that are true: the concert stage, the events transpiring in the background, or other individuals present. But the witness will not be able to discern small changes that are undetectable to her, such as the insertion of the defendant’s likeness onto another individual who was actually present at the event. The proponent of the evidence cannot ask whether the witness recalls every detail in the video—the amount of detail makes the task inconceivable for both the proponent and witness. Instead, the proponent asks the witness to testify whether the picture is a fair and accurate portrayal of the scene that she remembers. Following the witness’s fair and accurate portrayal testimony, the jury will see evidence with small but significant alterations.216Witkowski, supra note 80, at 282 n.65.
The blending of pictorial communication and silent witness theories sheds light on why a witness in this context can no longer meet the recollection element of personal knowledge.217See supra Part III.A. The witness here likely has a variety of memories from the event depicted. She may remember which speaker or entertainer the event featured, some details on how the event was laid out, or what the stage looked like. By recalling any of these factors, she likely feels comfortable attesting to the video as a fair and accurate portrayal of the scene. If asked to testify whether she remembers these specifics, she certainly passes the personal knowledge standard for any of them, even if she expresses some uncertainty.218See Ladd, supra note 175 and accompanying text (noting that statements like “I think” do not render witness testimony excludable). However, if asked specifically whether she saw the defendant at the event, the witness may have no recollection. Nonetheless, the witness testifies that the entirety of the scene is a fair and accurate depiction of her memory. Despite the proponent offering the video under the pictorial communication theory, the jury sees all of the surrounding details encompassed by the video, whether the witness recalled them or not. The witness cannot possibly recollect that volume of detail if the court (unrealistically) examined her recollection of each and every individual detail of the video. The only way for the witness to testify that the video is a fair and accurate portrayal is speculation because of the likelihood that the witness cannot detect whether changes have been made. Of course, she may specifically state that she remembers a manipulated part of the video and identify it, but in doing so, she has authenticated substantive facts that she did not actually remember and likely has no reason to suspect that there were any limitations on her fair and accurate assessment.219See Wade et al., supra note 67, at 904–06 (illustrating how few witnesses suspect that video evidence may be doctored).
Witnesses’ inability to perceive changes in fake video is twofold: not only are witnesses unlikely to be able to perceive changes, but they are also willing to affirmatively remember portrayals in video that were altered and did not actually take place.220Id. Critically, a witness’s inability to perceive changes in the depiction does not reflect in her understanding of her own perceptions. The psychological suggestibility that fake video has on memory221See supra Part I.C. warps the reliability of an authenticating witness. Professor Kimberly Wade’s psychological study is a convincing demonstration that photographs and video are powerful tools to refresh a witness’s memory, even when the memory that the imagery invokes never happened.222See Wade et al., supra note 67, at 904–06. The suggestibility problem inherent in fake video and fake narratives vastly increases the likelihood that a witness believes that she has the personal knowledge to authenticate a video, even absent any intended deception by the witness.223Bennett, supra note 61, at 1357–58. One study showed that human memory is so susceptible to the effect of suggestibility that even using “the” instead of “a” as a definite article in a question dramatically affects the witness’s likelihood of recalling seeing an object. See id. at 1357 n.144 (citing Elizabeth F. Loftus & Guido Zanni, Eyewitness Testimony: The Influence of the Wording of a Question, 5 Bull. Psychonomic Soc’y 86, 87–88 (1975) (showing a significant increase in the percentage of affirmative test subject responses to the questions “Did you see the [object]?” and “Did you see a[n] [object]?”)).
Suggestibility along with the precision of deepfakes pose both technological and psychological restraints to a witness’s determination of fair and accurate portrayal. Such testimony does not merely represent a limitation on the witness’s recollection capability when attesting to the authenticity of a video—it represents a complete inability to make the determination of her personal knowledge, which pushes a witness’s personal knowledge past the level of uncertainty normally allowed to establish a foundation. The previous example of the alibi video is distinct from examples in which witnesses were unsure of their perceptions, had limited sensory perceptions available, or had incomplete information.224See supra notes 165–79 and accompanying text. In each of these scenarios, there was some ability for the witness to recognize and articulate the limitations of her personal knowledge, whether in perception or recollection.225See supra notes 165–79 and accompanying text. Here, however, a witness can only label the entire video or scene as a fair and accurate depiction by using those facts that she does recall from the scene and augmenting them with speculation. This is especially dangerous when combined with the psychological effect of suggestibility that is especially strong with video—the witness is likely to convey inherently speculative fair and accurate depiction testimony confidently and without doubts as to her recollection capability.226See supra notes 67–71 and accompanying text.
This analysis does not characterize the evidence’s probative value itself to be speculative—the video, if authenticated, may be highly probative or speculative in its own right depending on what it depicts and what its proponent intends to demonstrate to the jury.227See Fed. R. Evid. 403 (requiring the court to balance relevancy against the risk of undue prejudice). Here, the witness’s fair and accurate portrayal testimony, not the evidence, becomes speculative—that is, it has little probative value on establishing the foundation for authentication.
At first glance, this characterization of the witness’s fair and accurate portrayal testimony seems to fly in the face of the strong tradition of a minimalist standard in which the proponent does not need to eliminate “all possibilities inconsistent with authenticity, or to prove beyond any doubt that the evidence is what it purports to be.”228United States v. Gagliardi, 506 F.3d 140, 151 (2d Cir. 2007) (quoting United States v. Pluta, 176 F.3d 43, 49 (2d Cir. 1999)); see also supra notes 124–26 and accompanying text. However, there is an important distinction from the long list of examples of courts admitting testimony based on shaky memories and imperfect observations229See supra notes 165–73 and accompanying text.: in each of these examples, the witness can qualify her imperfect memories by articulating the degree of limitation or imperfection. She can communicate how “positive” or not she is, or how well she was able to perceive the facts by explaining, for example, how dark it was, how far she could see, or whether she could make out facial features. She could also describe her vantage point and identify physical or environmental limitations. Ultimately, these examples all provide a minimal articulable basis for recollection and perception as a foundation. The deepfakes problem, compounding pre-existing issues with digital photography, creates an authenticating witness who cannot articulate her level of confidence or capability when it comes to labeling an entire video sequence as fair and accurate; the potential forgeries are too high quality,230See supra Part I.A. and psychological factors create a sense of certainty that does not reflect the true degree of speculation.231See supra Part I.C. The result is that the only way for a witness to testify that a video sequence is a fair and accurate depiction is by augmenting her memory with speculation, even if she do not realize she is doing it.232See Wade et al., supra note 67, at 899–900.
C. Digital Photographic Evidence Warrants a More Stringent Means of Authentication
Because witnesses will no longer be able to meet the legacy standard of Rule 901(b)(1)’s knowledgeable witness by attesting that a video is a fair and accurate portrayal, courts need to look elsewhere for a sufficient finding that photographic evidence is what its proponent claims it is. This new standard does not necessarily replace Rule 901(b)(1), which is still applicable for a variety of other forms of evidence.233For example, documents that are not self-authenticating may still fall under the knowledgeable witness standard of Rule 901(b)(1) or other means of Rule 901(b). Instead, a proposed new section would specifically govern the unique challenges that digital photography in the modern age present:
(Proposed New) Rule 901(b)(11): Before a court admits photographic evidence under this rule, a party may request a hearing requiring the proponent to corroborate the source of information by additional sources.
As mentioned earlier, the processes offered in Rule 901(b) to establish a foundation for authentication are not exhaustive; Rule 901(b)(1) or 901(b)(9) are not the exclusive options.234Rice & Katriel, supra note 95, § 7.02[A][a]. A proponent may also use circumstantial evidence to establish a foundation for authentication without adhering to one of the processes enumerated in Rule 901(b).235Id. This new rule essentially codifies an existing means of authentication and requires it for photographic evidence. Thus, even if the proponent cannot produce a witness with personal knowledge, methods of proving authenticity “can be infinite in variety, limited only by the circumstances pertaining in the particular case.”236§ 5 Jack B. Weinstein & Margaret A. Berger, Weinstein’s Federal Evidence § 901.03 (Mark S. Brodin & Matthew Bender eds., 2d ed. 2020), LexisNexis.
A Rule 901(b)(11) hearing would consider authentication factors beyond the bare bones requirement of 901(b)(1). A starting point for elements for the court to consider at this stage is the presence of additional corroborating evidence, as the court would consider in instances where the proponent establishes its foundation outside of the traditional 901(b)(1) or 901(b)(9) paths.237See Marie-Helen Maras & Alex Alexandrou, Determining Authenticity of Video Evidence in the Age of Artificial Intelligence and in the Wake of Deepfake Videos, 23 Int’l J. Evidence & Proof 255, 258–59 (2019) (recommending requiring corroborating evidence to authenticate video in an article addressing deepfakes in the context of probative value rather than personal knowledge).
Returning to the example above, if the government called for a Rule 901(b)(11) hearing to challenge the alibi video that the defendant submitted, the court would require more than a knowledgeable witness to establish a foundation for authenticity. For example, if the proponent offered a ticket stub or other circumstantial evidence of the defendant’s attendance, it would corroborate the authenticity of the video. The proposed rule would not rule out the utility of the witness through direct testimony. If a witness testified that she personally saw the defendant at the alibi event (a specific observation that a witness is far more likely to recollect concretely) as opposed to whether the entire scene is a fair and accurate portrayal, then the witness would easily meet the personal knowledge standard.
D. Increased Scrutiny Prior to Admission is Worth Risking Excluding Relevant Evidence Because of the Heightened Risk of Jury Prejudice Associated with Photographic Evidence
The alibi example may seem redundant; if there was a witness to corroborate a defendant’s alibi, then why does the defendant need the video in the first place? The more troublesome instance is where the video is the only source of evidence concerning the alibi, whether because the videographer is unavailable for some reason or is a criminal defendant herself and unwilling to testify at trial.238See U.S. Const. amend. V. The proposed rule likely poses a threat to the volume of digital media submitted in court. The immediate counterargument to address elevated foundational standards for authentication of photographic evidence is that a jury can consider these factors at trial just the same as the court can in a preliminary hearing. After all, nearly all forms of evidence, from written documents to oral assertions, are vulnerable to the potential for fraud; the system depends on a jury (with help, if necessary, from expert witnesses) to assign weight to evidence based on credibility and relevance.
But the heightened risk of forgery inherent in deepfakes warrants heightened admission standards. Photography, and to a greater extent, video, have a stronger effect than other forms of evidence; they cannot be so easily dismissed once seen.239Mnookin, supra note 200, at 2–3. “The photograph, in particular, has long been perceived to have a special power of persuasion, grounded both in the lifelike quality of its depictions and in its claim to mechanical objectivity. Seeing a photograph almost functions as a substitute for seeing the real thing.” Id. at 1–2 (footnotes omitted); see also Wade et al., supra note 67, at 899. While the emotional power of photographic, and especially video, evidence is generally thought of as an issue of probative value for courts to consider under Rule 403—such as when evidence is relevant but contains extremely graphic content that renders it unduly prejudicial—suggestibility is not the type of emotion that typically factors into the Rule 403 calculus.2402 Broun et al., supra note 84, § 215.
In the context of questionable or competing forms of evidence, juries have a tendency to cast aside other, less interesting forms of evidence when presented with the ease and convincing nature of viewing photographic evidence.241See Cynthia A.R. Woollacott, Evaluating Video Evidence, 14 L.A. Law. 24, 25 (1991) (considering various issues balancing probative value against the risk of undue prejudice); see also Thomas v. C.G. Tate Constr. Co., 465 F. Supp. 566, 571 (D.S.C. 1979) (articulating the court’s concern over video’s “dominating effect [that] will distract the jury from its proper consideration of other issues they will be called on to decide” because of how the video will “stand out in the minds of the jury”). Juries are also remarkably poor at adhering to limiting instructions242See, e.g., Panian, supra note 190, at 1215 (citing Harry Kalven, Jr. & Hans Zeisel, The American Jury, 417–27 (1971) (discussing the tendency of juries to stray from judicial instructions)). or even admonishments to disregard inadmissible evidence.243See 1 Joel D. Lieberman & Daniel A. Krauss, Jury Psychology: Social Aspects of Trial Processes 75–89 (2009) (examining juries’ difficulty with limiting instructions and analyzing the “backfire effect” of inadmissible evidence already seen by juries). In fact, they also “paradoxically pay greater attention to information ruled inadmissible than if the judge had not drawn attention to the admissibility of the information and simply allowed it into evidence.”244Id. at 79. Thus, even if a party casts doubt on the authenticity of photographic or video evidence, once the court admits it, the vivid images remain in a jury’s mind. By virtue of video’s emotional effect and the tendency to prioritize it above other forms of evidence, the risk of waiting for a jury to consider initial corroborating evidence concerning a video’s authenticity justifies the court’s consideration of these factors prior to admission.245See Woollacott, supra note 241, at 25.
A preliminary hearing to consider circumstantial authentication factors does not solve the deepfakes evidentiary crisis—but it does mitigate it. The proposed standard for establishing a foundation would still be limited and does not render photographic evidence forgery-proof; a jury still ultimately determines credibility and weight of the evidence that is admitted. Because of the challenges in creating effective detection measures (and the especially worrisome challenge that such measures will improve the forgery process),246See supra Part I.B. regulation and potential criminal solutions are in order to address deepfakes on a larger scale and stem their potential entry into the courtroom.247See, e.g., Caldera, supra note 11, at 177–78; de Zayas, supra note 72; Harris, supra note 11, at 102–03; Spivak, supra note 11, at 340–41. Until then, a preliminary hearing process would bolster the confidence in video evidence for a jury to consider, rather than allowing all photographic evidence to pass the foundational stage with a testimonial witness who lacks the requisite personal knowledge to attest to the evidence’s validity.
The age of machine learning has contributed to human achievements and triumphs equaled only by the risk that it creates when placed in the wrong hands.248See Charles Towers-Clark, AI Diagnosis Tool Bridges the Gap Between Doctors and Patients, Forbes (Feb. 13, 2019, 12:03 PM), https://www.forbes.com/sites/charlestowersclark/2019/02/13/ai-diagnosis-tool-bridges-the-gap-between-doctors-and-patients [https://perma.cc/E9L2-UM8A]; James Vincent, Twitter Taught Microsoft’s AI Chatbot to Be a Racist Asshole in Less than a Day, Verge (Mar. 24, 2016, 6:43 AM), https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist [https://perma.cc/XA23-Y9XL] (describing how Twitter users manipulated a publicly accessible artificial intelligence chatbot). Unfortunately for our trusting eyes, this atom cannot be unsplit, and artificial intelligence-enabled video creation is likely here to stay. As regulators scramble to address the risks posed by fake video created through GAN techniques, the legal standard for authentication of video evidence has fallen behind; evidentiary standards need to evolve to accommodate our changing world. The result will likely be a reduction in the reliance on photographic evidence in court after nearly a century of the steady rise in confidence and reliance upon photographic evidence to capture moments lost to human memory.