Provides essential data security guidance for organisations that develop AI or ML systems Credit: Shutterstock The Australian Securities Directorate (ASD), in collaboration with international partners, has come with new advice on best practices for securing data throughout the artificial intelligence (AI) and machine learning (ML) system lifecycle. The cyber security advice highlights the importance of data security in ensuring the accuracy and integrity of AI outcomes and outlines potential risks arising from data integrity issues in various stages of AI development and deployment. It also provides essential data security guidance for organisations that develop and/or use AI systems, primarily using AI systems in their operations, with a focus on protecting sensitive, proprietary or mission critical data. The principles outlined are meant to provide a “robust” foundation for securing AI data and ensuring the reliability and accuracy of AI-driven outcomes. According to the ASD, data security is of paramount importance when developing and operating AI systems. Organisations in various sectors rely more and more on AI-driven outcomes, data security becomes crucial for maintaining accuracy, reliability and integrity. The guidance provided in the ASD’s cyber security information sheet (CIS) outlines a “robust approach to securing AI data and addressing the risks associated with the data supply chain, malicious data and data drift”. Data security is an ever-evolving field and continuous vigilance and adaptation are key to staying ahead of emerging threats and vulnerabilities, noted the CIS. The best practices presented encourage the highest standards of data security in AI while helping ensure the accuracy and integrity of AI-driven outcomes. AI system lifecycle When it comes the AI system lifecycle, securing data is paramount to maintaining information integrity and system reliability. The CIS advised with starting in the initial ‘plan and design phase’, carefully plan data protection measures to provide proactive mitigations of potential risks. In the next ‘collect and process data phase’, data must be carefully analysed, labeled, sanitised and protected from breaches and tampering. Importantly, securing data is paramount in the ‘build and use model phase’ to help ensure models are trained on reliably sourced, accurate, and representative information. In the ‘verify and validate phase’, organisations are urged to rigorously test AI models built from training data to uncover security flaws and address them effectively. This stage will be necessary each time new data or user feedback is introduced into the model and the data also needs to be handled with the same security standards as AI training data. Implementing these strict access controls protects data from unauthorised access, especially in the ‘deploy and use phase’, while continuous data risk assessments in the ‘operate and monitor phase’ will help the model adapt to evolving threats. “Neglecting these practices can lead to data corruption, compromised models, data leaks, and non-compliance, emphasising the critical importance of robust data security at every phase,” noted the CIS. Best practices to secure data for AI-based systems Included in the CIS was a list of recommended practical steps that system owners can take to better protect the data used to build and operate their AI-based systems, regardless of it running on premises or in the cloud. The list recommends: 1. Source reliable data and track data provenance Verify data sources use trusted, reliable and accurate data for training and operating AI systems. To the extent possible, only use data from authoritative sources. 2. Verify and maintain data integrity during storage and transport Maintaining data integrity is an essential component to preserve the accuracy, reliability and trustworthiness of AI data. 3. Employ digital signatures to authenticate trusted data revisions Digital signatures help ensure data integrity and prevent tampering by third parties. Adopt quantum-resistant digital signature standards, to authenticate and verify datasets used during AI model training, fine tuning, alignment, reinforcement learning from human feedback (RLHF) and/or other post-training processes that affect model parameters. 4. Leverage trusted infrastructure Use a trusted computing environment that leverages Zero Trust architecture. Provide secure enclaves for data processing and keep sensitive information protected and unaltered during computations. 5. Classify data and use access controls Categorise data using a classification system based on sensitivity and required protection measures. In general, the output of AI systems should be classified at the same level as the input data, rather than creating a separate set of guardrails. 6. Encrypt data Adopt advanced encryption protocols proportional to the organisational data protection level. This includes securing data at rest, in transit and during processing. AES-256 encryption is the de facto industry standard and is considered resistant to quantum computing threats. 7. Store data securely Store data in certified storage devices that enforce NIST FIPS 140-3 compliance, ensuring that the cryptographic modules used to encrypt the data provide high-level security against advanced intrusion attempts. 8. Leverage privacy-preserving techniques There are several privacy-preserving techniques that can be leveraged for increased data security, including: Data depersonalisation techniques Differential privacy Decentralised learning techniques 9. Delete data securely Prior to repurposing or decommissioning any functional drives used for AI data storage and processing, erase them using a secure deletion method such as cryptographic erase, block erase, or data overwrite. 10. Conduct ongoing data security risk assessments Conduct ongoing risk assessments using industry-standard frameworks, such as the NIST SP 800-3r2, Risk Management Framework and the NIST AI 100-1 Artificial Intelligence RMF. General risk for data consumers According to the CIS, from the moment data is ingested for use with AI systems, the data acquirer must secure it against insider threats and malicious network activity to prevent unauthorised modification. The use of web-scale databases includes all the risks outlined earlier, and this can’t “simply assume that these datasets are clean, accurate, and free of malicious content”. Third-party models trained on web-scraped data used to train a model for downstream tasks could also affect the model’s learning process and result in behaviour that was unintended by the AI system designer. For mitigation strategies, the CIS recommends: Dataset verification: Once a dataset is ingested, the consumer or curator should verify, as much as possible, whether it is free of malicious or inaccurate material. Content credentials: Use content credentials to track the provenance of media and other data. Foundation model assurances: In the case where a foundation model is trained by another party, the developers of the foundation model need to be able to provide assurances regarding the data and sources used and certify that their training data did not contain any known compromised data. Require certification: Data consumers should strongly consider requiring a formal certification from dataset and model providers, attesting that their systems are free from known compromised data before using third-party data and/or foundation models. Secure storage: Data needs to be stored in a database that adheres to the best practices for digital signatures, data integrity, and data provenance that are described in detail above. Along with the ASD, the CSI was created with information from the Australian Signals Directorate’s Australian Cyber Security Centre, the New Zealand’s Government Communications Security Bureau’s National Cyber Security Centre (NCSC-NZ), the US’ National Security Agency’s Artificial Intelligence Security Centre (AISC), Cybersecurity and Infrastructure Security Agency (CISA) and the Federal Bureau of Investigation (FBI) and the UK’s National Cyber Security Centre (NCSC-UK). SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe