Algorithms used in machine learning systems and artificial intelligence (AI) can only be as good as the data used for their development. High quality data are essential for high quality algorithms. Yet, the call for high quality data in discussions around AI often remains without any further specifications and guidance as to what this actually means. Since there are several sources of error in all data collections, users of AI-related technology need to know where the data come from and its potential shortcomings. AI systems based on incomplete or biased data can lead to inaccurate outcomes that infringe on people’s fundamental rights, including discrimination. Being transparent about which data are used in AI systems helps to prevent possible rights violations. This is especially important in times of big data, where the volume of data is sometimes valued over quality.
The European Union’s strong fundamental rights framework, as enshrined in the Charter of Fundamental Rights and related case law, provides guidance for the development of guidelines and recommendations for the use of AI. This paper sets out to contribute to the many ongoing policy discussions around AI and big data by highlighting the awareness and avoidance of poor data quality. It does not aim at explaining how to use high quality data, but how to become aware of and avoid using low quality data.