Measuring Sample Size

StreetLight’s Big Data resources include about 65M devices in the US and Canada, which covers approximately 23% of these countries’ combined adult population. However, clients should not expect a 23% penetration rate for all StreetLight InSight analyses they run. Penetration rates for individual analyses can range from as small as 1% to as large as 35%.

As is the case with any Big Data provider, sample size and penetration rate for a given analysis depend on the specific parameters used in the study. The reason is that some data are useful for certain analyses, but are not useful for others. For example, a device may deliver high-quality, clean location data for one study, but messy, unusable location data – or no data at all – for another. Efficiently identifying the data that are “useful” for a particular analysis is a critical component of the data science value that differentiates StreetLight Data. Because penetration rates vary, sample sizes are automatically provided for almost all StreetLight InSight analyses. This allows users to calculate penetration rates and to better evaluate the representativeness of the sample. Sample size values also are useful to clients who wish to normalize StreetLight InSight results through additional statistical analysis.

For the LBS trips-based analyses, such as Zone Activity and O-D, sample size is currently provided as the number of unique devices and the number of trips. These values should be thought of as most similar to “person trips.” For LBS points-based analyses, Visitor Home and Work analyses, we only give device sample size because the Metric does not analyze trips, but rather pings. These should be thought of as "person visits." (Learn more about Visitor Home and Work Methodology.) For Navigation-GPS analyses, sample size is provided as the number of trips analyzed. These should be thought of as “vehicle trips.”

Note: Just as the indexes are not comparable across data sources, the sample size Metrics are also not comparable for different data sources. For example, the device sample size given for Visitor Home and Work Analyses, which is an LBS points-based analysis, is not comparable to the device or trip sample sizes of other Project Types, which are Navigation GPS or LBS trips-based analyses.

In general, though not always, the trip sample size for commercial Navigation-GPS data will be higher than the device (truck) sample size. Commercial trucks that are in active use typically take many trips per week that are often on set routes; thus, they are more likely to have up-to-date fleet management tools, and that means they are more likely to be included in StreetLight’s Navigation-GPS data set. Trucks that are more rarely used are less likely to be included in the data set.

In general, though not always, the trip sample size for LBS data will be lower than the device (person) sample size. The reason is that not all devices in StreetLight’s database capture every single trip perfectly. To illustrate, consider this hypothetical example:

  • 8:00AM: Device creates location data at expected home location
  • 2:00PM: Device creates location data at sports arena

This device has created useful information for analyzing the home locations of visitors to the arena. However, since the device didn’t create any location data on the trip to the arena, perhaps because it was off, then the route taken and the travel time cannot be calculated with certainty. As result, it could not be used in an analysis of road activity on an arterial near the arena.

As another example, consider a device that generates regular pings for each trip taken over 10 days. However, the user deletes the smart phone app that created that data, and it stops pinging. That device then disappears for the last 20 days of the month. The device’s data can still be used, but the trip penetration for the month is only 33% of this person’s trips, not 100%.

Typical daily trip penetration rates are between 1 and 5% of all trips on any one specific day. StreetLight’s pricing and data structure encourage looking at many days of data. The costs are the same for analyzing an average day across three months and analyzing a single day. Thus, we encourage clients to evaluate the total sample across the entire study period instead of focusing on per-day penetration rates.

Note: Sample sizes are not automatically provided for AADT or Traffic Diagnostics Projects. They are available by request. These analyses use a very large volume of location data, so providing sample sizes automatically via StreetLight InSight would negatively impact data processing speeds.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request



Article is closed for comments.