User talk:Fæ/DrugStats

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Shortcut: COM:DrugStats

Chart showing that commercial prices for Insulin glargine, used for management of diabetes, have trebled in the USA over ten years. The US patent will expire in 2027.

Introduction[edit]

The DrugStats site makes information about the top US prescribed drugs available in chart format. The two charts displayed are total prescriptions per year (in millions) and drug costs per year, both price and average 'out of pocket' cost, based on how most people in the United States would have to pay towards the drug price after health insurance coverage.

The data is based on Medical Expenditure Panel Survey (MEPS), itself free, with added "sanitation" of the data, both removing buggy data and standardizing so that drugs with an identical active ingredient are listed as synonyms.

Configuration and technical stuff[edit]

Costs chart for Brompheniramine Maleate; Codeine Phosphate; Phenylephrine Hydrochloride showing truncated name.

File names use the format:

<drug name> <chart type> (DrugStats).svg

All charts are added to Category:Charts from DrugStats.

Only SVG is uploaded, with reusers able to use the standard interface to download PNG in different sizes if they wish. The SVG format is extremely compact, so sizes are measured in a few kilobytes. The SVG can be edited using any text editor, so reusers may be able to fairly easily adapt the chart directly by either editing the SVG code, or using a vector editing tool like the open source InkScape editor.

Standard Plotly options are used, including the default grey background and grid style. Non-default changes have been to increase line width, dot size, font and position the legend inside the grid area. Where drug names are over 25 characters, the titles are slightly abbreviated. The chart data used to generate the chart is included on the image page, so it is easy for reusers to check or create their own charts. The date for the chart is the website page creation date, rather than the date the Plotly version has been created.

Where a drug has only one data point, they are skipped. Example

Files are uploaded with this project page linked in the upload comment.

The Python Pywikibot source code on Github to run the upload is available at upload_drugstats. Post-upload housekeeping tasks are at drugstats_housekeeping.

Titles[edit]

The title trimming is insufficient where drug names are sequences of several brand names. As a housekeeping task, charts with titles over 50 characters are trimmed down to the first name separated by a semi-colon. For example "Brompheniramine Maleate; Codeine Phosphate; Phenylephrine Hydrochloride" is truncated to "Brompheniramine Maleate", though unfortunately there are three drug variants with this as the lead ingredient so context would be critical. This task checks the SVG code directly to test if the title has overrun in length and edits that text.

The cases where this approach is unsatisfactory are probably relatively small, so further fixing is left to manual editing, i.e. download, adapt in an SVG editing tool to create a better title, and overwrite or create as a separate file.

Copyright[edit]

Individual charts and data are theoretically uncopyrightable, though the copyright of the database underpinning the source website is presumed to be all rights reserved, in line with the website design. The site has a CC-BY-SA 4.0 release for "All ClinCalc DrugStats figures and graphs".

The DrugStats charts are generated on-demand using Google Charts, ref https://developers.google.com/chart. Rather than relying on this "un-open" service, the data tables are formatted for Commons using Plotly which is free, stable for use in Python, and open source. A benefit is that the charts are saved as vector format, making them excellent to display as thumbnails on Wikipedia, or for reusers to display in full page presentations.