Data Privacy Regulations and Technological Advancements
By: Swajay DixitAbstract
The fundamental perception of data has evolved over time, with every aspect of our lives now being transformed into data that big corporations strive to capture and utilise for their growth. This often results in unethical and inappropriate actions leading to breaches of privacy. To prevent these violations, companies should be mandated to adhere to various international regulations and compliance standards. If they fail to do so, authorities should draw from international laws and regulations to take appropriate action against these firms.
Keywords: Data Privacy, Big Data, Corporations, Ethical Data Use, International Regulations, Privacy Breach.
Introduction
Our basic understanding of data is that it is a set of information converted into binary digits for further processing. However, for big data giants, data has evolved into a form of currency. Every bit of information, including the details of a living human, is transformed into data. This data encompasses lifestyle habits, eating routines, sleep cycles, expenditure records, career plans, and even retirement strategies. Big tech and data companies often store their consumers’ data in exchange for using their services. These bits and pieces of information, detailing our behaviours and traits, are stored in data clouds through data warehousing.
This data can be used in both ethical and unethical ways. Ethically, data is used to enhance market competition. For instance, insurance companies gather data on the most purchased insurance covers in the market, analyse this data, and create similar packages or counteroffers to increase sales and compete with other insurance companies. Data analysis also helps companies identify areas for service improvement.
There are significant unethical uses of data analysis, particularly the risk of data breaches. Companies collect detailed information about our lives, which can lead to attempts to influence or control us. The massive collection and processing of data increase the risk of data privacy breaches. A data privacy breach occurs when unauthorised parties access sensitive or confidential information. This issue is a global concern affecting individuals, organisations, and governments.
When logic or algorithms are applied to data, it evolves into artificial intelligence (AI). Big data giants utilise large datasets for machine learning and AI development to gain insights into consumer behaviours and preferences. AI technology provides numerous benefits but also raises concerns about personal data usage and privacy rights.
Technological Advancements Leading to Challenges in Data Privacy:
Continuous technological advancements in data science and AI have given rise to new issues related to data privacy and the use of AI in data breaches. Most AI bots, such as ChatGPT or Gemini, are based on large language models (LLMs). These models utilise sophisticated algorithms that take input data, store it within the model, and use complex learning techniques to generate outputs. Popular LLMs include Titan and Falcon 40B.
Every AI-based chatbot and application contains its own LLM. However, companies that lack the resources to develop their own LLMs often rely on those developed by major tech giants, operating under contractual obligations. This practice, while necessary for some businesses, increases the risk of unauthorised data access by the owners of these LLMs.
Data breaches from cloud sources are a significant concern. Big tech companies collect vast amounts of data from various sources. Initially, this data was stored on company servers. However, as storage needs expanded to terabytes and petabytes, data clouds were developed to manage these large volumes. Without a robust multilayer security system, cloud storage can lead to major data leaks, as unauthorised access often requires just the cloud URL (Uniform Resource Locator). Additionally, multiple cyberattacks using AI bots can further compromise data security.
Companies frequently use personal and sensitive data to train and test their LLMs, yet there is a lack of transparency regarding these practices. There is no public disclosure about the data sources or the methods of data collection employed by tech corporations. Users provide consent for specific applications or functions, but there is ambiguity about whether this data is being used solely for the intended purpose or for future projects, raising concerns about data breaches.
The issue of data privacy breaches presents complex legal questions. Data privacy is a global problem, with the right to privacy enshrined as a fundamental right in most democracies. Consequently, violations of data privacy can be seen as violations of fundamental rights on a global scale. This underscores the need for stringent data protection measures and transparency from tech companies to ensure the ethical use of data and the protection of individual privacy rights.
Suggestions:
To address these challenges, tech giants should be placed under stringent contractual obligations and regulations regarding data privacy laws. Companies should be compelled to operate under international regulatory clauses as a condition for licensing. For instance, in the health insurance sector, compliance with regulations such as HIPAA, HITRUST, and SOC2 Type 1 should be mandatory to ensure the privacy of consumer data.
To tackle cloud security issues, a robust multi-layered security system must be implemented. This should include continuous oversight by cybersecurity teams at each access point. Additionally, data stored in the cloud should be encrypted. While encryption is a common practice in data analytics, many firms still neglect this crucial security measure. It should be mandatory for all companies to use encryption to ensure data security.
When it comes to training LLMs, companies should only collect and use the necessary data to enhance model performance. This data should come from public sources such as websites, books, and publicly available reports, not from personal or confidential information.[1]
Global cooperation and learning from international authorities and regulators are essential to address the legal issues surrounding privacy breaches. In the United States, the Federal Trade Commission (FTC) addresses cases where firms use personal data for illegal or inappropriate purposes, considering such actions beyond the scope of participant consent. In such instances, companies may be required to delete the data immediately, and in some cases, destroy the AI models built using that data.
The General Data Protection Regulation (GDPR) is a unified data privacy law across the European Union. It mandates that companies notify individuals about data collection and have lawful bases for processing data unless an exemption applies. Non-compliance can result in significant penalties. For example, a company in Poland was fined €220,000 by the Polish supervisory authority for processing contact data scraped from public registers without notifying individuals.[2]
There is a pressing need to question big data giants about their legal basis for processing personal information. Recently, the Italian Data Protection Authority temporarily banned ChatGPT and questioned OpenAI over its legal basis for data processing.[3] This highlights the importance of ensuring companies have legitimate grounds for using personal data.
Conclusion
In today’s world, data has become more than just a set of information; it is one of the most valuable assets. Technological advancements in data science and AI bring about significant benefits but also pose substantial risks regarding unethical data use and breaches of personal data. Data privacy breaches are serious issues that can lead to violations of fundamental rights. Therefore, these breaches must be addressed with utmost seriousness, with proper laws and regulations in place. Companies that fail to comply with these regulatory rules and requirements must be held accountable to protect individuals’ privacy rights.
[1] Amy Vinograd, loose-lipped large language models spill your secrets: the privacy implications of large languagemodels,36, Harvard journal of law and technology,616,650, 2023, https://jolt.law.harvard.edu/assets/articlePDFs/v36/Intro-Pages-36.2.pdf.
[2] Kristof Van Quathem & Anna Oberschelp de Maneses, Polish Supervisory Authority
Issues GDPR for Data Scraping Without Informing Individuals, INSIDE PRIV. (April 4, 2019),
https://www.insideprivacy.com/data-privacy/polish-supervisory-authority-issues-gdpr-fine-for-data-scraping-without-informing-individuals [https://perma.cc/K27N-NRZW].
[3] See Melissa Heikkilä, OpenAI’s Hunger for Data Is Coming Back to Bite It, MIT
TECH. REV. (April 19, 2023), https://www.technologyreview.com/2023/04/19/1071789/ope ais-hunger-for-data-is-coming-back-to-bite-it/ [https://perma.cc/5LDL-XHVJ].