Measuring Sample Size

StreetLight’s Big Data resources include about 65M devices in the US and Canada, which covers approximately 23% of these countries’ combined adult population. However, clients should not expect a 23% penetration rate for all StreetLight InSight analyses they run. Penetration rates for individual analyses can range from as small as 1% to as large as 35%.

As is the case with any Big Data provider, sample size and penetration rate for a given analysis depend on the specific parameters used in the study. The reason is that some data are useful for certain analyses, but are not useful for others. For example, a device may deliver high-quality, clean location data for one study, but messy, unusable location data – or no data at all – for another. Efficiently identifying the data that are “useful” for a particular analysis is a critical component of the data science value that differentiates StreetLight Data. Because penetration rates vary, sample sizes are automatically provided for almost all StreetLight InSight analyses. This allows users to calculate penetration rates and to better evaluate the representativeness of the sample. Sample size values also are useful to clients who wish to normalize StreetLight InSight results through additional statistical analysis.

For LBS analyses, sample size is currently provided as the number of unique devices and/or number of trips for LBS analyses, depending on the type of analysis. These values should be thought of as most similar to “person trips.” Including both the number of devices and trips for all LBS analyses is in our product roadmap. Sample size is provided as number of trips for navigation-GPS analyses. These should be thought of as “vehicle trips.”

In general, though not always, the trip sample size for commercial navigation-GPS data will be higher than the device (truck) sample size. Commercial trucks that are in active use typically take many trips per week that are often on set routes; thus, they are more likely to have up-to-date fleet management tools, and that means they are more likely to be included in StreetLight’s navigation-GPS data set. Trucks that are more rarely used are less likely to be included in the data set.

In general, though not always, the trip sample size for LBS data will be lower than the device (person) sample size. The reason is that not all devices in StreetLight’s database capture every single trip perfectly. To illustrate, consider this hypothetical example:

  • 8:00AM: Device creates location data at expected home location
  • 2:00PM: Device creates location data at sports arena


This device has created useful information for analyzing the home locations of visitors to the arena. However, since the device didn’t create any location data on the trip to arena, perhaps because it was off, then the route taken and the travel time cannot be calculated with certainty. As result, it could not be used in an analysis of road activity on an arterial near the arena.

As another example, consider a device that generates regular pings for each trip taken over 10 days. However, the user deletes the smart phone app that created that data, and it stops pinging. That device then disappears for the last 20 days of the month. The device’s data can still be used, but the trip penetration for the month is only 33% of this person’s trips, not 100%.

Typical daily trip penetration rates are between 1 and 5% of all trips on any one specific day. StreetLight’s pricing and data structure encourage looking at many days of data. The costs are the same for analyzing an average day across three months and analyzing a single day. Thus, we encourage clients to evaluate the total sample across the entire study period instead of focusing on per-day penetration rates.

Note: Sample sizes are not automatically provided for Visitor Home-Work, AADT, or Traffic Diagnostics Projects. They are available by request. These analyses use a very large volume of location data, so providing sample sizes automatically via StreetLight InSight would negatively impact data processing speeds.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.