National Statement on Ethical Conduct in Human Research (2007) (Updated May 2015)
"This section discusses various research methods and fields. Some chapters are a result of the further expansion of this revised National Statement beyond health and medical research. The focus is on general principles – the section is not intended to be exhaustive. It reflects the interdisciplinary nature of many types of research and the use, in some research projects, of a number of different research methods."
This learning module, from the Responsible Conduct in Data Management tutorial developed by the Northern Illinois University, describes issues related to maintaining integrity of the data collection process.
"Data collection is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. The data collection component of research is common to all fields of study including physical and social sciences, humanities, business, etc. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same."
The broad term ‘anonymisation’ can be used to cover various techniques which convert personal data into de-identified data, to assist in making rich data resources available whilst protecting individuals’ privacy.
Some common techniques are covered here, but each has pros and cons and may not remove risk completely, depending on the data and potentially the additional information an intruder has access to.
The information here is from the document Anonymisation: managing data protection risk -code of practice by the U.K Information Commissioner’s Office, which has more detail about each of these techniques, as well as case studies.
This involves stripping out obvious personal identifiers such as names from a piece of information, to create a data set in which no person identifiers are present. Could be partial or use a linking ‘key’.
De-identifying data so that a coded reference or pseudonym is attached to a record to allow the data to be associated with a particular individual without the individual being identified.
Data is displayed as totals, so no data relating to or identifying any individual is shown. Small numbers in totals are often suppressed through ‘blurring’ or by being omitted altogether.
Derived data is a set of values that reflect the character of the source data, but which hide the exact original values. This is usually done by using banding techniques to produce coarser-grained descriptions of values than in the source dataset eg replacing dates of birth by ages or years, addresses by areas of residence or wards, using partial postcodes or rounding exact figures so they appear in a normalised form.
Remember to factor in the time and money to prepare data for sharing.
A recent paper published in Trials (BioMed Central) examined these factors and, from the two examples, established that around 40-50 hours of staff time and £2,000 - £3,000 were needed to prepare patient level data from clinical trials.
The UK Data Archive provides a comprehensive overview of data documentation at various levels of granularity.
"A crucial part of making data user-friendly, shareable and with long-lasting usability is to ensure they can be understood and interpreted by any user. This requires clear data description, annotation, contextual information and documentation."