Post#25: Dr Susie Weller, Prof Rosalind Edwards, Prof Lynn Jamieson and Dr Emma Davidson: Selecting data sets to create new assemblages

The focus of today’s blog is on the process of identifying qualitative material from multiple archived data sets to bring together to conduct secondary analysis. This process is the first stage in a four-step breath-and-depth method we developed for analysing large volumes of qualitative data. We draw on our experiences of conducting the ESRC National Centre for Research Methods project, of which the Big Qual Analysis Resource Hub is an outcome. Utilising different qualitative longitudinal research (QLR) data sets housed in the Timescapes Archive, our project aimed to explore the possibilities for developing new procedures for working across multiple sets of archived qualitative data. The blog is based on our forthcoming chapter in Kahryn Hughes and Anna Tarrant’s book ‘Advances in Qualitative Secondary Analysis’ (Sage).

Selecting data sets to create new assemblages

The availability of volumes of complex qualitative data for secondary analysis is growing. Indeed, major research funding bodies in the U.K. regard the sharing of data as vital to accountability and transparency and, for some, it is a contractual requirement. Furthermore, the increasing influence of big data, which has until now, generally concerned large-scale quantitative data sets, highlights the potential for researchers to enhance further the value of existing qualitative investments. Yet, the full potential of archived qualitative has yet to be realised.

The development of central and local digital repositories opens up exciting possibilities for doing new research using existing data sets. With that comes the opportunity to bring together one or more data sets into a new assemblage in order ask new questions of the data, make comparisons, explore how processes work in different contexts, and provide new insights.

Major contemporary online qualitative archival sources established internationally for data preservation and sharing include (see also the Registry of Research Data Repositories):

Many of these data repositories have been designed with re-use in mind and material is accompanied by documentation about the original project such as: aims and objectives, the methodology, sample and methods, and units of analysis, as well as file types and formats; in other words, descriptive, structural and administrative ‘meta data’ about the data set. Registration, including signing an ‘end user’ agreement or licence, is usually a requirement prior to gaining access and downloading data sets. Such agreements often contain clauses around the use, storage and sharing of data.

Identifying appropriate qualitative material for a given project involves exploring the data that is available in an archive or across several archives. You could bring together data from many different projects housed in one archive, as we have done. Alternatively, data sets from different repositories could be synthesised, or you could search for archived material to bring into conversation with your own data.

The aim of this initial search is to gain a precursory understanding of the nature, quality and ‘fit’ with the research topic of the available small-scale data sets. We saw parallels between this process and that of an archaeologist’s aerial survey. We felt we needed to fly systematically across a data landscape to get a good overview. This part of the process is likely to be time-consuming. It can be wide-ranging, for example, locating data sets on a broad topic area, or it could be quite narrow, focused on searching for data to fit a specific substantive issue or set of research questions. As part of this initial identification of data sets we found it useful to explore some of the outputs produced by the original researchers.

The process of searching within a given archive varies. The UK Data Service (UKDS), for instance, features the ‘Discover’ search function for reviewing their data catalogue, which includes the option to filter for qualitative data sources.  The search function in the Timescapes Archive allows browsing by project, concepts or descriptive word, enabling searches by criteria such as gender, employment status etc. This approach does rely on the keywords assigned to each data item by the original research team, so there may be data that is of interest that does not come up on a descriptive word search. New forms of searching are currently in development. In archives such as ‘Qualibank’, accessed via the UKDS, detailed searches can be conducted across the content of the entire collection, although at present this comprises only a small collection of classic studies. Using international archives can raise further challenges of searching for terms in different/multiple languages or making appropriate translations.

Searches within an archive(s) are guided by the researchers’ own questions, research topic, and the geographic or linguistic context and these help in the process of deciding which data sets or which parts of multiple small-scale data sets, to include or exclude from the larger, combined data set to be constructed, that we have referred to as our data assemblage. This unique assemblage can be viewed as a new data set, with its own methodological history and the potential to be curated and used by other researchers.

In our study, we surveyed the parameters of six of the core data sets deposited in the Timescapes Archive. We initially kept the six projects separate in order to get a sense of the scope and nature of each the data sets. We mapped the studies, explored the state and volume of the data, viewed any contextual material and metadata available, logged the research tools used, and gained an overview of the substantive emphasis of each project. We then used the qualitative analysis software, NVivo, to help us manage the volume of data and decided, as part of this process to harmonise file names to aid retrieval and the reorganise the files from their original data sets into new groupings – gender and cohort generation – based on our substantive focus and chosen unit of analysis for cases. It was at this point that the individual datasets were merged into our new data assemblage. You can read more about our breath-and-depth method for qualitative analysis in our paper: Big data, qualitative style: a breadth-and-depth method for working with large amounts of secondary qualitative data, Quality & Quantity, 53(1): 363–376. We have also made available our data assemblage in the Timescapes Archive (coming soon).





Leave a Reply

Your e-mail address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.