Notice C

Promoting the collaborative development of proposals for investments in digital health global goods

Making the most widely-deployed ODK tools better Global Goods

Notice C Opportunity: 
Announcement C0: Global Good Software Development and Support
Application Status: 
Approved - partially funded

Executive Summary

The Open Data Kit (ODK) community produces free and open-source software for collecting, managing, and using data in resource-constrained environments. The tools are primarily used by health organizations to collect data quickly, accurately, offline, and at scale. This effort focuses on the ODK 1 suite of tools (Collect, Aggregate, Central, XLSForm, Build, JavaRosa) which are widely-deployed global goods that have been used to collect billions of data points. Example projects include:

  • For governments working to end polio, access to accurate and timely information makes a world of difference. ODK is used in Jordan, Afghanistan, Pakistan, Somalia, and South Sudan as a key tool in mass polio vaccination campaign quality control. https://www.youtube.com/watch?v=zROyvrvt-zk
  • PMA2020 uses ODK to collect a nationally representative sample of data from households and service delivery points in selected sentinel sites. The data is used to estimate health indicators on an annual basis in 11 pledging FP2020 countries. https://pma2020.org/what-we-do

ODK has been designed for novice users in challenging environments and its robustness in these environments has driven the platform’s adoption and evolution. Additionally, the choice to build an active open-source community around ODK has allowed it to benefit from users, implementers, and developers.

Over the last year, the ODK project has transitioned from a single “owner” to community governance, and under the leadership of the ODK 1 Technical Steering Committee (TSC), the ODK 1 user base has experienced extraordinary growth during that time (forum has grown 166% to almost 8K users), mobile client actives have grown 140% to 170K users). We wish to build on that growth and deliver improvements to ODK Collect (mobile app) and ODK Aggregate (server), and our user documentation.

Consortium Team

Nafundi is a software company started by the founders of ODK. Nafundi’s leadership, Dr. Yaw Anokwa and Ms. Hélène Martin, lead much of the software development and community activities on the ODK 1 tools. Nafundi will be the lead organization and will provide project management and software development.

eHealth Africa (eHA) builds stronger health systems through data-driven solutions. Adam Butler is the Technical Team Lead at eHA and has experience building solutions from platforms like ODK, CKAN, OpenHIE, DHIS2, and OSM. eHA will provide project management and software development.

Biostat Global Consulting offers a broad array of services in survey sampling, study design, data management and cleaning, statistical analysis, and conveying results clearly to technical and non-technical audiences. Dale Rhoda from Biostat is deeply familiar with ODK Collect and the challenges of data entry errors.

Nafundi and eHA participate on the ODK 1 TSC and have used an open and collaborative roadmapping process to improve ODK 1 tools over the last year. Dale Rhoda, a principal at Biostat has worked with both Nafundi and eHA on reducing data entry errors in ODK Collect.


Project Description

The consortium, with consultation from the ODK 1 TSC, has identified four work packages where support from Digital Square would enable the ODK 1 community to address well-articulated, but under-resourced needs.

Per ODK 1’s governance at https://github.com/opendatakit/governance/tree/master/TSC1, our proposed activities are sourced from community discussions at http://forum.opendatakit.org and filed issues on https://github.com/opendatakit.

For each software development activity, the consortium will add detail to each filed issue to enable it to be closable by a pull request (a code submission). Each pull request will be reviewed by another core contributor and tested by the existing test team. Once reviewed and tested, the code will be merged. For software artifacts, a beta build will be shared with the community, and if no issues are found, a production build will be released. Non-software artifacts (e.g., docs) are released continually.

For each activity, we will publish a tentative plan and a call for contributors on our community forum and social media. The consortium will facilitate those contributors and serve as backstops to ensure ongoing and consistent process.

Work Package 1: Improving ODK Collect for disease surveillance

ODK Collect has seen 140% increase of its user base over the last year and now processes millions of submissions a month. Nafundi has led this core software development and the ODK 1 TSC would like to take this funding opportunity to broaden the core contributor base by resourcing another organization to deepen its contributions while improving ODK Collect’s functionality in disease surveillance.

Our approach for this work package is to take on complex features that require deep changes to ODK Collect, ODK JavaRosa, and the XForms specification. To ensure completion, we wish to resource a committed and experienced developer from eHA to implement these features:

  • Fields dependent on earlier field not updated
  • Remember previously entered value
  • Group related text or numeric inputs into a grid
  • Refinements to repeat group navigation
  • Specification for lightweight case-management
  • Addressing data accuracy issues identified in Work Package 2

eHA will be responsible for delivering these changes with Nafundi assisting.

Work Package 2: Improve data entry accuracy across all ODK Collect widgets

In 2016, Biostat Global conducted a data entry experiment with eHA in Nigeria on ODK Collect. The study characterized data entry error rates for entering 10,000 known dates using a combination of factors. The experiment showed high data entry error rates for the common date interfaces in ODK Collect. A presentation on methods and results is available here. Shortly after the results were released, Nafundi, with guidance from Biostat Global, led the community in improving the date interfaces.

We would like to use this work package to extend the data entry work to rigorously study:

  • Changes that the ODK community made to date widgets - Did they improve error rates?
  • Other question types of high importance - Dates are the most important element in some types of surveys, but other surveys rely on other question types to calculate the main outcomes. We will use use subsets of questions from the UNHCR Standardized Expanded Nutrition Survey (SENS) and PMA 2020 (and hopefully use data collectors from those projects) to enter many thousands of known responses and assess entry error rates.
  • How do error rates vary teams and team members across several data collection countries?

Biostat Global will design the experiments and work with data collection teams from several countries to deliver results, publish recommendations for design changes that the ODK community should make, and work with eHA’s developers from Work Package 1 to address the most serious drivers of data errors.

Work Package 3: Making ODK Aggregate more maintainable

There are high-value changes to ODK Aggregate that would enable maintainers to better understand usage patterns and address many of the common problems that users encounter when using the software. For this work package, we will deliver the following:

  • Usage analytics to better inform maintainers about usage patterns
  • Improved error messages that point to common resolutions to reduce support burden
  • Removed all deprecated functionality (e.g., Google Accounts, ODK2)
  • Reworked help system that leverages new docs website

Our anticipated outcomes are greater insights into which features of Aggregate are important to improve and which could be deprecated. We also expect to dramatically reduce the support questions around Aggregate.

Nafundi will be responsible for delivering these changes.

Work Package 4: Improving user documentation and docs process

We measure the effectiveness of our user documentation by the support questions that could be answered with a link to the documentation. Maintainers file issues against the docs when community members are not able to answer support questions with a link to the docs.

While we have made great strides over the last year by launching a new ODK documentation website at http://docs.opendatakit.org, there are gaps around docs that require deeper knowledge of the tools than is typically readily available in our community. For this work package, we will deliver the following documents:

  • How form versioning works
  • Explain external data tradeoffs
  • Google Drive/Sheets as a lightweight server

We will also seek to make our documentation easier to contribute to by reworking our contribution process and addressing the docs backlog.

eHA will be responsible for delivering these changes with Nafundi assisting.


Use Cases, User Stories

Work Package 1: Improving ODK Collect for disease surveillance

This work package is motivated by problems seen when deploying ODK Collect for disease surveillance. Our high-level user goals are to solve long-standing issues that increase data accuracy by providing enumerators with important context during data entry, reducing fatigue by automating repetitive or manual tasks, grouping related questions in more natural grid views and improving navigation and management of repeating elements.

“Fields dependent on earlier field not updated” is described in more detail at https://github.com/opendatakit/collect/issues/378. Resolving this issue will:

  • Help enumerators understand better what dependent questions are asking by placing the questions on the same screen
  • Help enumerators reduce data loss when questions appear and disappear for no apparent reason
  • Help managers reduce training by relying on behavior that is very common across other data collection tools in the ODK ecosystem

“Remember previously entered value” is fully described in more detail at https://forum.opendatakit.org/t/9116. Resolving this issue will:

  • Help enumerators reduce fatigue by not having to enter the same data multiple times
  • Help managers reduce data variation by ensuring data is only entered once
  • Provide designers with a light-weight mechanism to share data across multiple forms

“Group related text or numeric inputs into a grid” is described in more detail at https://forum.opendatakit.org/t/13398. Resolving this issue will:

  • Ensure enumerators can more easily collect several values of the same kind
  • Reduce the need for designers to design complex and hard to maintain forms
  • Lay the groundwork for designers to add numeric or textual entry with units

“Refinements to repeat group navigation” is described in more detail at https://forum.opendatakit.org/t/11792. Resolving this issue will:

  • Provide enumerators with a more intuitive navigation and management of repeated form elements
  • Help managers reduce training around use of repeating form elements
  • Provide designers with a lightweight mechanism to gain case-management without designing multiple forms

“Specification for lightweight case-management” is described in more detail at https://forum.opendatakit.org/t/6827. Resolving this high-risk but valuable issue will:

  • Communicate a clear vision to the community about how lightweight case-management will be handled
  • Provide a specification and plan for implementation that does not disrupt the existing ecosystem
  • If possible, an initial implementation.

Work Package 2: Improve data entry accuracy across all ODK Collect widgets

Survey implementers, including eHA in Nigeria and Data Management Aid in Bangladesh sometimes implement World Health Organization Vaccination Coverage Cluster Surveys using ODK.

One key element of those surveys is to record the child’s date-of-birth and every date on which they were vaccinated. These dates are usually copied from either a home-based or facility-based health record. The survey analysis uses dates to compute the ages at which children receive each dose and compares the distribution of ages as observed with the recommended age from that country’s vaccination schedule.

The data help program managers assess whether doses are commonly being administered too early or too late and whether providers are giving all of the doses that the child is eligible for on every visit. Identifying facilities and regions that frequently experience so-called “missed opportunities” for simultaneous vaccination can represent low-hanging fruit for improving vaccination coverage. If the workers there can be trained to identify all of the doses that the child is eligible for, they can easily be administered: after all, the largest hurdle has already been cleared...the child is present at the vaccination facility...they have received at least one dose...let’s be sure that they receive all of the doses that they are eligible for today.

But the missed opportunities indicator is very sensitive to data entry errors. If a vaccination date is mis-entered, then it will look to analysts as though the child visited a facility and received only one dose on a day, when in fact they may have also received several others whose dates were entered correctly. Entry errors in the dose dates are bad, but errors in the date-of-birth are even more consequential. If that date is wrong, then all the calculations about the child’s age when they received every dose will be wrong. So to draw correct conclusions about how a country’s vaccination delivery system is performing, in terms of timeliness and simultaneity of administering doses, it is crucial that field data collectors be able to enter dates correctly.

When data are collected on paper forms or photographs it is common for data entry firms to guarantee error rates smaller than one error in every hundred data elements when they perform double-data entry using keyboards. By contrast, in the Biostat Global / eHA experiment in Nigeria, errors were observed in 10% of dates entered in the configuration that most closely matched those commonly used in practice.

For vaccination program managers around the world to trust the results of date-based analyses, we need to confirm that improvements to the interfaces, and maybe instructions, can yield error rates as low as those available with keyboard data entry.

So this work package is informed by problems seen when studying error rates in ODK Collect. Our high-level user goals are to ensure that the various user interfaces choices have not introduced data accuracy problems.

As part of this Work Package, we will have documented, for the first time using an experimentally rigorous design and large sample size, baseline error rates for a set of fundamental and common ODK question types, using a variety of participants across a variety of teams. The results will be helpful to the ODK community to know whether to invest precious resources into additional measures to mitigate data entry errors.

We will share the resulting dataset and error rates, published and shared in an open forum so other investigators who use ODK can run simulations with their own data to understand the implications of entry errors in their own work.

We will have generated a set of experimental artifacts (XLSForms, PDFs of faux respondent responses, computer code for scoring the errors) that can be easily used or modified to repeat or extend the experiment with other data collection partners.

The data collected here can help inform the design of future experiments, if this experiment shows that additional work is needed to improve interfaces, or instructions or incentives. If additional work is needed, then future experiments will be needed, and the error rates and their correlation structure across participants and teams will be very helpful for planning the size of those endeavors.

Work Package 3: Making ODK Aggregate more maintainable

This work package is informed by problems seen while maintaining ODK Aggregate. Our goals are to reduce the maintenance burden so the developer community can more quickly and more confidently make changes that will provide the most value to the hundreds of thousands of users who have downloaded ODK Aggregate.

“Usage analytics to better inform maintainers” is described in more detail at https://github.com/opendatakit/aggregate/issues/309. Resolving this issue will:

  • Provide project leadership with a privacy-preserving mechanism of measuring usage and impact
  • Increase developer confidence about which features should be added, removed, or changed

“Improved error messages” is described in more detail at https://github.com/opendatakit/aggregate/issues/244 and https://github.com/opendatakit/aggregate/issues/248. Resolving these issues will:

  • Reduce implementer frustration by providing immediate solutions to common problems
  • Reduce community support burden by ensuring implementers can solve own problems

“Removed all deprecated functionality” is described in more detail at https://github.com/opendatakit/aggregate/issues/286 and https://github.com/opendatakit/aggregate/issues/287. Resolving these issues will:

  • Reduce implementer frustration from features which are not functional or supported
  • Reduce implementer security risk from unmaintained and untested code

“Reworked help system that leverages docs” is described in more detail at https://github.com/opendatakit/aggregate/issues/311. Resolving this issue will:

  • Enable implementers to have access to the most up-to-date documentation
  • Enable contributors to add to documentation without being a Java developer

Work Package 4: User documentation

This work package is informed by gaps in documentation for complex workflows that ODK users are likely to run into. Rather than only writing the documentation, we wish to use this Work Package to improve the contribution process and enable more of the community to contribute documentation.

“Reworking contribution process” is described in more detail at https://github.com/opendatakit/docs/issues/823. The community has shown a desire to contribute documentation, but has been discouraged by the difficulty of the contribution process. Resolving this issue will reduce the work required for non-technical contributors to contribute to ODK’s docs.

“Reducing the backlog” is described in more detail at https://forum.opendatakit.org/t/14654/3 . The community has shown a desire to reduce the back log documentation, but there has not been staff to drive a concerted effort. Resolving this issue will help users make the most out of ODK.

“How form versioning works” is described in more detail at https://github.com/opendatakit/docs/issues/645. Resolving this issue will explain to implementers to how to safely and quickly add or remove questions from forms.

“Explain external data tradeoffs” is described in more detail at https://github.com/opendatakit/docs/issues/73. Resolving this issue will enable implementers to understand what the various external data mechanisms are, why they came to exist, and the tradeoffs of each.

“Google Drive/Sheets as a lightweight server” is described in more detail at https://github.com/opendatakit/docs/issues/748. Resolving this issue will enable implementers to set up a lightweight backend that does not require the relatively heavy infrastructure of ODK Aggregate or ODK Central.


Digital Health Technologies

The key digital health tool the project will be investing in is ODK and its ecosystem. ODK 1 tools have become the standard for non-routine data collection and management and for this effort, we’ll be focused on the ODK suite’s most popular tools, ODK Collect and ODK Aggregate.

User facing tools

  • ODK Collect is an Android Java app that renders forms that comply with the ODK XForms spec. It is powered by ODK JavaRosa and is a client to ODK servers.
  • ODK Aggregate is Java server that can be deployed on Tomcat or Google App Engine. It can be backed by PostgreSQL, MySQL, MS SQL Server, or Google Cloud Data Store.

Libraries and specifications

  • OpenRosa, APIs for how ODK clients communicate with ODK servers.
  • ODK XForms spec, a subset of the W3C XForms specification, for use in the ODK ecosystem.
  • ODK JavaRosa, a Java library that renders forms complying with ODK XForms.
  • XLSForm spec, a high-level Excel-based form specification.
  • pyxform, a Python library that converts XLSForms into ODK XForms.


Community Feedback

The ODK forum has almost 8,000 members who are familiar with our public feature development process. The consortium will use the forum to engage with the broader community.

For each feature, the consortium will describe the background, goal, non-goals, user stories, and user interaction on the community forum and solicit and facilitate feedback. This feedback will be gathered on an ongoing basis until the feature is ready to be specified. The consortium will leverage its existing relationships and social media to ensure end users are aware of this process.

Once a feature is ready to be specced, it will be moved to the relevant GitHub repository where it will receive feedback from our more technical community members, including the ODK’s technical leadership. As the feature is being built, the consortium will encourage feedback and review from the broader developer community. This feedback will be gathered on an ongoing basis until the feature is ready to be built.

One to two weeks before a tool release, a beta release will be announced on the ODK forum and via social media. Hundreds of users typically participate in betas and the consortium will gather their feedback. Alphas and betas happen monthly and adjustments are made until users report no problems.

To gather ongoing feedback during this process, the consortium will rely on our existing meetings. During our monthly developer meetings, the consortium will invite developers from the broader community to provide feedback. During our biweekly TSC meetings, the consortium will include an agenda item to provide detailed technical feedback ongoing work on this effort.


Self-Assessment on the Global Goods Maturity Model

Our self assessment (with rationale) is available at https://docs.google.com/spreadsheets/d/1jixQ-42cfwGRNQb7XI4PAy6I4FhZTRcZy05rVQH9Rxw and the results are below.

 

Digital Health Atlas

The consortium lead has selected three polio projects in three countries. These projects showcase the use of the core ODK 1 technologies in data collection and management for polio surveillance.


Workplan and Schedule

Our workplan and schedule for each Work Package is shown below.

 

ResponsibleM1M2M3M4M5M6M7M8M9M10M11M12
 
 Work Package 1: Improving ODK Collect for disease surveillance
Tech Lead (eHA)Identify and recruit Android Developer           
Tech Lead (eHA)Work with ODK TSC to produce syntax spec for grid layouts          
Tech Lead (eHA) Work with ODK TSC to evaluate possible approaches to updating of dependent fields         
Tech Lead (eHA)  Work with ODK TSC to finalize spec for remembering values        
Tech Lead (eHA)Iterate on case management specification with ODK TSC    
Tech Lead (eHA)     Work with BioStat to specify changes based on outputs of WP2     
Tech Lead (Nafundi)Provide technical guidance on specification and implemention
Android Developer (eHA) Group related text or numeric inputs into a grid         
Android Developer (eHA)   Fields dependent on earlier field not updated       
Android Developer (eHA)    Remember previously entered value      
Android Developer (eHA)       Lightweight case-management
Android Developer (eHA)      Addressing data accuracy issues (based on outputs of WP2)    
Android Developer (Nafundi) Refinements to repeat group navigation        

 

 

 

ResponsibleM1M2M3M4M5M6M7M8M9M10M11M12
 
 Work Package 2: Improve data entry accuracy across all ODK Collect widgets
Technical Lead (Biostat)Make questionnaire           
Mid-level statistician (Biostat)Make respondents set           
Technical Lead (Biostat)Make experiment design           
Mid-level statistician (Biostat)Compile experiment responses           
Senior statistician (Biostat)Make XLSForm and set up server          
Technical Lead (Biostat)Recruit up to 8 teams to conduct the first round          
Mid-level statistician (Biostat) Materials to train coordinators          
Mid-level statistician (Biostat) Materials to orient participants          
Technical Lead (Biostat) Train onsite coordinators        
Onsite coordinators  Coordinators recruit participant and proctor experiment        
Senior statistician (Biostat)   Pull data and compare with expectations       
Senior statistician (Biostat)    Document observed error rate and variability      
Technical Lead (Biostat)     Prioritize changes with devs      
Technical Lead (Nafundi)      Make changes to the top priority widgets    
Mid-level statistician (Biostat)        Generate PDFs for next round   
Onsite coordinators        Repeat fieldwork  
Senior statistician (Biostat)         Repeat coordinator clarification  
Senior statistician (Biostat)         Repeat analysis 
Technical Lead (Biostat)          Document for sponsor review and publication

 

ResponsibleM1M2M3M4M5M6M7M8M9M10M11M12
 
 Work Package 3: Making ODK Aggregate more maintainable
Tech Lead (Nafundi)Work with ODK TSC to evaluate approaches on analytics, error messages, and deprecations.  Work with ODK TSC to evaluate approaches on removing help system      
Java Developer (Nafundi)  Usage analytics to better inform maintainers        
Java Developer (Nafundi)  Improved error messages        
Java Developer (Nafundi)  Remove all deprecated functionality        
Java Developer (Nafundi)      Reworked help system that leverages docs    

 

ResponsibleM1M2M3M4M5M6M7M8M9M10M11M12
 
 Work Package 4: Improving user documentation and docs process
Docs Lead (Nafundi)Reworking the contribution process          
Docs Lead (eHA)Reducing the backlog    
Tech Lead (Nafundi)Provide technical guidance on back log issues    
Docs Lead (eHA)  How form versioning works        
Docs Lead (Nafundi)  Explain external data tradeoffs        
Docs Lead (Nafundi)  Google Drive/Sheets as a lightweight server        


Project Deliverables and Timeframe

Below are our deliverables grouped by work package and partner. We provide estimated timeframe for delivery based on complexity of the deliverable and scheduling into ongoing work.

 LeadQ1Q2Q3Q4
WP1: Fields dependent on earlier field not updatedeHA   
WP1: Remember previously entered valueeHA   
WP1: Group related text or numeric inputs into a grideHA   
WP1: Refinements to repeat group navigationNafundi   
WP1: Addressing data accuracy issueseHA   
WP1: Lightweight case-managementeHA   
WP2: Experimental materialsBiostat   
WP2: Form designs and server setupsNafundi   
WP2: Experiment results made availableBioStat   
WP2: Follow-up experiment results made availableBioStat   
WP3: Usage analytics to better inform maintainersNafundi   
WP3: Improved error messagesNafundi   
WP3: Removed all deprecated functionalityNafundi   
WP3: Reworked help system that leverages docsNafundi   
WP4: Reworking contribution processNafundi   
WP4: Reducing the backlogeHA   
WP4: How form versioning workseHA   
WP4: Explain external data tradeoffsNafundi   
WP4: Google Drive/Sheets as a lightweight serverNafundi   


Tagging

“data collection, management, and use”, “ODK”, “Open Data Kit”, “data accuracy”, “documentation”, “usability”


2 sentence overview

Open Data Kit replaces paper forms and surveys with smartphones and tablets. It helps field workers collect data accurately and report results instantly.

This investment from Digital Square will be used to sustainably address long-standing issues with the most widely-deployed ODK tools and documentation.


Comments

Thanks for your submission and we are looking forward to your developed full proposal. It would be great to include some health-oriented usecases where the specific features mentioned would add value within the health domain.