Note: The Canadian Covid Open Data Working Group (CCODWG) has updated their data format, and the previous format stopped being updated on May 4, 2022. This affects my data products at the Canadian Public Health Region level (i.e. sub-province but larger than city neighborhoods; equivalent to US counties). I am working on updating my code to use the new data source; since I have a full-time job this may take some time. In the meantime up-to-date data products for health regions in Canada are not available past May 3, 2022. I apologize for the inconvenience.
- I am not an epidemiologist. I may get things wrong, and you should not use my analyses as predictive tools or for policy decisions.
- I am providing access to data along with some basic mathematical and statistical manipulations of said data, packaged into visualizations I think are useful. I am not making predictions about where the pandemic is headed next.
- Data on small populations are inherently more uncertain. Take any data product for local scales such as counties, neighborhoods, small states, etc with some hefty grains of salt.
- These analyses are always retrospective. There is a known lag between when someone gets sick and when the databases find out about it, often several days to a week. When things are getting worse, assume they're worse than the data reveal. When they're getting better, however, assume you don't know what's actually going on, because there could be a growth in cases happening that hasn't shown up in the data yet. Additionally, many of my measures make use of rolling averages or running sums, which will also tend to increase the degree to which the analysis lags the real world. Where possible I use the center of the rolling window for the associated timestamp, which reduces the lag effect for historical data, but for recent data it absolutely still applies.
- The data are always incomplete. Not everyone who gets covid gets recorded in the databases, because some people are asymptomatic and have no idea they're infected, some tested positive on an at-home rapid test and just immediately isolated and never went and got a PCR test, or they got sick and just never got tested at all.
- Vaccination reduces the severity of infection. My analyses focus on case numbers, but there will come a point where that stops being a useful measure of the severity of an outbreak, because people won't get hospitalized or die from covid anymore.
My Data and Methodology Sources
For Toronto and Ontario data, I source my data directly from the respective public health authorities. Canadian public health region data is sourced from the COVID-19 Canada Open Data Working Group. For US and global data, I use the John's Hopkins COVID-19 Dashboard github repository.
- COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University
- Status of COVID-19 cases in Ontario by Public Health Unit (PHU) - Ontario Data Catalogue
- Epidemiological Data from the COVID-19 Outbreak in Canada, COVID-19 Canada Open Data Working Group
- Toronto Public Health COVID-19 Cases
- Rt is computed using the Bayesian method described in Yap & Yong (2020)
Most data is synced twice a day at 10:30 am ET and 4:30pm ET, and plots are updated over the next 30 minutes. US counties for individual states are updated once a day over 12 hours starting at 1:30pm ET. During this time, individual plots for specific health regions may not be available. Refer to the timestamp at the bottom of the page for an individual county to see when it is likely to update. Note that data sources are updated by their respective public health authorities and data caretakers, and may not update at the same time or on a daily basis. In particular, weekends and statutory holidays may delay reporting. Also note that testing backlogs can also cause the data to lag real-world caseloads by significant margins.
I have repackaged the data from the various datasets into a hierarchical data format (HDF-5) that is available to download using the links on the right-hand side of the page. These datasets contain data all formatted the same way, with similar reference patterns, from national to local scales. The datasets also include my Rt values, along with the posterior probability distributions and likelihood functions (this accounts for the bulk of the dataset's size). A slim version that omits the posteriors and likelihoods is also available.
A shapefile is also now available, which has identical naming conventions as the locations in the HDF5 file. No COVID-19 data is stored in the shapefile, only location names, shape data, and coordinates, but the HDF5 and shapefiles can be used together to map the data. A Python script is also provided which contains a routine for performing the merge using a GeoPandas GeoDataFrame. The script requires Pandas, GeoPandas, numpy, and h5py.
Breakdown of Toronto's daily cases by status--active, recovered, ever hospitalized, or dead. Note that 'hospitalized' is a status that applies to any of the other 3 categories. These data have not had a rolling average applied. The first plot gives the data on a linear vertical scale, while the second gives the data on a logarithmic vertical scale, which better shows different exponentials.
Left: PNG | PDF
Right: PNG | PDF
Raw and 2-week average of Toronto's effective reproductive number. This is the average number of people a sick person will infect. If this is increasing, then transmission is increasing, even if cases are still declining. If this is above 1, then cases are increasing. If it is decreasing, then transmission is declining, even if cases are still rising. Note that due to the existence of super-spreaders, this metric is not the same as the number of people the average sick person will infect (i.e. a person selected at random from the cohort of infected people will typically infect fewer people than would be implied by Rt, but a small fraction will infect many more.
PNG | PDF
Effective reproductive numbers for all Toronto neighborhoods. The opacity of each line depends on the number of recent cases in that neighborhood (total in the last 3 weeks). This means that the darkest lines will determine the overall behavior of Toronto's overall effective reproductive number.
PNG | PDF
Current effective reproductive numbers for all Toronto neighborhoods. The opacity of each bar depends on the number of recent cases in that neighborhood (total in the last 3 weeks). This means that the darkest lines will determine the overall behavior of Toronto's overall effective reproductive number.
PNG | PDF
Current time derivative of the effective reproductive number for all Toronto neighborhoods. The opacity of each bar depends on the number of recent cases in that neighborhood (total in the last 3 weeks). This metric indicates how fast transmission is increasing or decreasing. Note that transmission can increase while cases are decreasing, or decrease while cases are still increasing.
PNG | PDF
Current 2-week mean time derivative of the effective reproductive number Rt for each province, with opacity weighted by current active cases per capita. This tracks whether transmission is increasing or decreasing, regardless of whether daily cases are increasing or decreasing (increasing transmission while cases are decreasing means that decline in cases is slowing, and could reverse).
PNG | PDF
County-level data is also available. Please note that depending on county population and characteristics, county-level data may not be as reliable.
Current 2-week mean time derivative of the effective reproductive number Rt for each state, with opacity weighted by current active cases per capita. This tracks whether transmission is increasing or decreasing, regardless of whether daily cases are increasing or decreasing (increasing transmission while cases are decreasing means that decline in cases is slowing, and could reverse).
PNG | PDF
The current COVID-19 death toll for each country, given as the cohort size per death (i.e. 1 death per 1000 people). The US and Canada are marked with an asterisk. Note that countries with smaller values are harder-hit (1 death per 300 is worse than 1 death per 1000), and the y-axis is on a logarithmic scale.
PNG | PDF