How to be Data Smart with R Programming and Accelerate Clinical Trial Efficiency

Technology led Navitas Life Sciences amped-up digital initiatives, especially during the pandemic, with the inclusion of innovation, creating fertile ground for enhanced clinical trial efficiency. This makes it imperative, more than ever, to find effective solutions for data management.

Navitas Life Sciences Data Support Services

Every stage in a clinical trial process requires the support of biostatisticians for trial design, protocol development, management of data, study monitoring, analysis of data, and reporting results. Whether your clinical trial is in the planning stage, or an already existing trial, our experienced and proficient biostatisticians can effectively support you.


We will work with you to develop:

  • Primary and Secondary Objectives
  • Sample Size and Power Estimation
  • Randomization and Blinding of Sponsor Team
  • Global Definitions and Conventions, Analysis Windows
  • Rules for handling Missing Data
  • Definitions of Populations: Efficacy-evaluable, Intention-to-Treat (ITT), Safety and other sub-groups
  • Interim Statistical Analysis
  • Demographics and Baseline Characteristics
  • Subject Disposition and Compliance to Study Treatment
  • Method for Analysis of Efficacy Endpoints
  • Evaluation of Safety Parameter

R Programing to Drive Efficiency

R programming is a freely available comprehensive, platform-independent programming language that is ideally suited for managing clinical trial data. It was developed by Ross Ihaka and Robert Gentleman in the 1990s for data handling, effective cleaning of the data, subsequent analysis, and a good representation of the results.

"The Technology space continues to expand, and it is paramount that we stay ahead, in terms of learning curve, in order to take advantage of the cutting-edge solutions that are required for Data science problem solving. "

R Programming in clinical trial data analysis

Published in BioSpectrum India, 23 July 2020

Shrishaila Patil,

Vice President, Statistical Programming,
Navitas Data Sciences (A part of Navitas Life Sciences)

R Programming has multiple benefits, including:

  • Presence of integrated tool for sharing results
  • Availability of specially developed packages
  • Utilization across multiple platforms, from LINUX to UNIX to Windows

Benefits of Using R Programming to Communicate Results

  • Advanced web-based dashboards and user interfaces
  • Reports can be viewed even if the program is not installed
  • Prior knowledge of R is not required to use the reports
  • Analysis can be converted easily into high-quality documents and presentations
  • Analysis can be conducted automatically at predefined times

Championing Transformations in Clinical Trials

There is a need to identify vital solutions and to make fundamental and lasting changes to navigate the ever-changing clinical trial eco-system. Shifts in solutions are vital for a successful transition to enhanced efficiency. Our experts, Shrishaila Patel, Vice President, Statistical Programming, and Troy Ruth, Director, Clinical Reporting, provided us with interesting insights about data management in clinical trials.

Tell us a bit about your professional background

Shrishaila Patil (SP): I have a Master’s in Biotechnology from Bangalore University and more than 17 years of experience in Drug Development. I am a "CDISC Volunteer" & “PhUSE India officer", and have supported “R Package Validation Framework” and “Open Source Technologies for Regulatory Submissions” Projects in PhUSE working group “Data Visualization and Open Source Technology in Clinical Research (DVOST)”.

I have authored an International book “FDA Clinical Outcome assessments and CDISC QRS supplements” and have experience working with various Analytical tools like SAS (Base and Advanced Certified), R, PYTHON and CDMS Tools like Inform (EDC), Medidata Rave(EDC), Clintrial and Oracle Clinical LSH.

What are some of the benefits of using R Programming

Some benefits include

  • Cost: Lower Cost of using, though an initial investment may be necessary.
  • Training/familiarity: Recent graduates are more likely to be familiar with R or Python than proprietary software.
  • Innovation: Thousands of R packages are available. Interactive data visualizations and dashboards created with Shiny are increasingly common.
  • Open-source: Open-source solutions can be shared more readily, e.g. CRAN.
  • Performance: It is easier to use open-source tools

What are the benefits of outsourcing data management to Navitas Life Sciences?

Navitas Life Sciences has deep expertise in leading and successfully completing Clinical Trials across various key therapeutic Areas like Oncology, Epidemiology etc, with established strategies and capability to support Decentralized Trials/Hybrid Trials. There is an availability of skilled resources/Data Scientists with knowledge of implementation of cutting-edge technologies like Artificial Intelligence (AI)/Machine Learning (ML), and other Automation Strategies.

We have Clinical Data management Subject Matter Experts to deal with new technology and adapt new processes and the capability to manage Big Data. In addition to EDC, we have the capability to manage efficient integration/processing of Data from multiple sources like Real World Data (RWD), biomarkers, genomics, imaging, video, sensors, and wearables (i.e. sequenced data), structured and unstructured data.

We have in place efficient risk management capabilities built around Data driven decision making through live Dashboards to see patterns and risks. There is end-to-end data Standardization and integration strategy that considers all the dimensions of Clinical Data.

Why is data management in clinical trials a complex activity?

Clinical Data Management (CDM) is undergoing a paradigm shift, with advances in technology (digitization) coupled with complexity in Clinical trials and acceleration of decentralized trials necessitating the need to move from traditional Clinical data management practices into advanced Clinical data science processes.

The right focus on CDM will help in ensuring Data credibility, reliability in addition to data integrity. We have seen Clinical trial designs evolving from time to time to adapt and increase effectiveness, like Adaptive trials, Virtual trials etc. CDM processes have accordingly evolved over the last two decades from Paper based clinical data collection systems to Electronic Data Capture (EDCs) and many other changes. Now, we are seeing further transformation in terms of support for non-EDC centric approach involving multiple data collection instruments, eSources, mobile technologies, etc.

Sensors and wearables generate high volume of Data (millions to trillions of times more than EDC) at high velocity (i.e. generated continuously multiple times per second). In this context, traditional CDM processes will not be viable. Upskilling of resources to support this “Data Management” to “Data Science approach” is the need of the hour.

Tell us a bit about your professional background

Troy Ruth (TR): I have a B.S. in Computer Science from Drexel University, Philadelphia, Pennsylvania. I have 30 years of industry experience, including clinical systems and tools development, EDC data collection and management operations, and statistical programming. I have been with Navitas Data Sciences and its predecessor, DataCeutics, for the past 18 years. In these roles, I have been programming with SAS since 1991 until the present. I have also spent time developing and coding with Visual Basic, SQL, and JavaScript, among other languages. I have presented at PharmaSUG, NESUG, and WUSS SAS conferences.

How is pharma industry progressing towards accepting and using R programming?

Many of the largest pharma and biotech companies are building out a full infrastructure for open-source tools, especially R. This is the step that is needed to provide the rigor required for validated environments and to provide the appropriate confidence that open-source tools are trustworthy and reliable. Also, as a Functional Service Provider, it has been difficult or impossible for Navitas Data Sciences to make our own opportunities to use R, while supporting these companies. However, since they are now actively implementing these environments and have signaled willingness to blend R into the work stream, we can be ready to look for the opportunities to use the efficiencies of R as both a company and also at the individual level as a staff of statistical programmers.

Everyone in this role must take on the personal challenge of upskilling to be ready to use R when the moment strikes. Sometimes a hammer is the best tool and sometimes a screwdriver is the right choice for a task at hand. It is similar with SAS and R, but until recently we have only had the figurative “hammer” in our tool kit.

Please give us a glimpse of your role at Navitas Life Sciences

I am a Statistical Programmer/Analyst (SPA), supporting Clinical Reporting within Navitas Data Sciences. It is the Biostatisticians role to determine the needs and appropriate analysis methods to prove the safety and efficacy of new drugs and therapies being tested in clinical trials. These analysis goals are compiled into the formal Statistical Analysis Plan (SAP). It is the job of the SPA to tabulate the raw data into standardized data sets referred to as SDTM. Next, the same or different SPA will create analysis-ready data sets from the tabulated data, called ADaM data sets. While the SDTM data sets are highly standardized and have many expected components, the ADaM data sets customized to include the data derivations necessary to support the specific analyses dictated by the SAP. The organization of these data sets is standardized, but the derivation methods can vary widely from project to project, based on the design and protocols for an individual trial. After the ADaM data sets are available, the actual clinical reporting is conducted, through the programming of tables, listings, and figures prescribed by the SAP.

Often overlooked, is the “Analyst” part of the SPA role. It is not enough to simply use a set of specifications to carry out programming to produce a deliverable of some sort. The SPA must think critically at each point in the process. They must consider whether the data, as collected, is reasonable, in the expected unit of measure, and complete. They must be able to understand the derivations and goals of the SAP to that process errors can be identified and shared with the Biostatistician and study team. Without the critical evaluation at each step, defects in the process will propagate and will become costly to fix late in the process. Therefore, the SPA must own their role and strive to understand the big picture, while being responsible for executing the low-level details.

Don’t miss the exciting insights about R programming, register for our Live Webinar

“Enhance your Data Science toolkit with open source tools – How R can create efficiencies in data review and insight”

Date: 07 October, 2021

Time: 10:00 – 11:00 EST

The webinar will explore the following key topics:

  • Alternative solutions to unlocking valuable insights from rich and diverse data
  • Statistics, Analytics, and Visualization – complementing the use of SAS with R
  • Ensuring regulatory compliance when using open source software
  • Industry innovation and collaboration to develop better technology ecosystems

The On-Demand version of this webinar is now available,
please click here


Shrishaila Patil

Vice President, Statistical Programming
Navitas Data Sciences

Troy Ruth

Director, Clinical Reporting
Navitas Data Sciences

To find out more about our service and solutions, reach out to us at This email address is being protected from spambots. You need JavaScript enabled to view it.