As you probably know, there is a publicly available database wcich contains many information on majority of clinical trials – at least on trials with US-citizens – started in 1983.
In this post I try to show what information is stored in this database and how can you manage it with free statistical tools.
I give a detailed description on the ID-structure and give solutions for specific scientific questions.
The questions I try to answer with this small presentation:
- How to determine the number of “recruiting” sites, how to generate a list of cities with total number of recruiting facilities and how to plot the ‘Recruiting’ sites on a Google map.
The data can be downloaded from
With choosing pipe-delimited text files, you can easily read the content with any text-editor (I would recommend notepad++).
If you have some statistical background and especially you have access to SAS you can download SAS transport files as well.
After downloading a close to 2 GB zipped file, youl’ll get a set of 40 files.
One of the tools can be used for management of this files is R or its menu-driven version RStudio.
As it is stated on the webpage http://aact.ctti-clinicaltrials.org, you can easily read the downloaded files with the help of the code:
read.table(file = "id_information.txt", header = TRUE, sep = "|", na.strings = "", comment.char = "", quote = "\"", fill = FALSE, nrows = 200000)
The most important file is the Studies database ( open in new window ). You can find information – among others – on
last verification date
number of arms and groups.
The file contains data of more than 251 thousand studies (only the first 1000 can be found on our site).
Task 1: Answer the question how many open (overall status = ‘RECRUITING’) studies can be found tabulated by sites.
We have to lean on Facilities and Studies databases. The Facilities database – the 1st 1000 records – can be checked here.
To get the database containing both study and facility relevant data, you have to merge the two databases.
In R with the command
library(Hmisc) library(data.table) library(DT)studies <- read.table("DIR/studies.txt", header = TRUE, sep = "|", na.strings = "", comment.char = "", quote = "\"", fill = FALSE, nrows=5000) facilities <- read.table("DIR/facilities.txt", header = TRUE, sep = "|", na.strings = "", comment.char = "", quote = "\"", fill = FALSE, nrows=5000) sites <- merge(studies, facilities, by = "nct_id") my <- c("nct_id", "overall_status", "city", "state", "zip", "country", "name") sitesa <- sites[my] sitesa$city <- tolower(sitesa$city)
If you would like to have a table on sites with “recruting’ status, you can obtain a table like this:
with the commands:
datatable(setDT(sitesa_c_final)[, .N, by = .(overall_status,city)][order(-N)])
Or if you would like to demonstrate the status of the sites on a Google map? There is no problem, but I would recommend to change from RStudio to Knime.
If you would like to place the sites on a map you’ll need their exact coordinates. The good news is that this information is also available for free. You can download the necessary database from Maxmind site ( https://www.maxmind.com/en/free-world-cities-database ).
Addition of the coordinates to the database with cities can be done with the following code:
coords <- read.table("e:/_job/clinicaltrials.gov/worldcities/worldcitiespop.txt", header = TRUE, sep = ",", na.strings = "", comment.char = "", quote = "\"", fill = FALSE) sitesa_c <- merge(sitesa, coords, by.x = "city", by.y = "City") sitesa_c_final <- subset(sitesa_c, sitesa_c$overall_status == "Recruiting")
This sitesa_c_final table is given to KNIME, where the following actions should be done:
The outcome looks like this, where the shown sites (indicated by their names) indicate the sites with ‘Recruiting’ status.