Choosing a Data Source

StreetLight Data offers Metrics derived from two different types of Big Data resources: Location-Based Services (LBS) data and navigation-GPS data. Some Types and Modes of Travel will automatically select a data source for you, but others require you to choose your data source. This How-To guide will show you how to choose the data source that makes the most sense for your analysis.

1. The Quick Answer: Navigation-GPS for Commercial, LBS with Pass-through for Personal Travel

In a nutshell, if you need to analyze commercial truck behavior, you should always use navigation-GPS data. If you want to analyze personal travel, you should almost always use LBS with Pass-through. However, there are a few small exceptions, as discussed in the detailed explanation below.

We try to keep StreetLight InSight as flexible as possible and to allow to maximum flexibility in data sources. While StreetLight InSight will automatically populate the data source field based on the other criteria you select when creating your Project, you can override this selection and choose your data source for most Projects.

Keep in mind that if you want to run a Project twice - for example, once with navigation-GPS for commercial vehicle analytics and once LBS with Pass-through for personal travel, that’s usually fine with us. However, StL Index values provided in your Project Metrics will not be directly comparable because our normalization algorithms for navigation-GPS and LBS are different.

2. Big Data Decision Tree

Follow this decision tree to choose your data source.




3. Navigation-GPS Data vs. Location-Based Services Data: A Detailed Breakdown

In most cases, you have a choice about which data source to use. Each data source has unique benefits and limitations that guide our recommendations.

These are the key characteristics of LBS data that you should consider:

  • LBS data has a larger sample size, so it’s ideal if getting a highly comprehensive sample is important to you. Keep in mind that our sample size varies regionally and temporally, so each Project you run will have a unique sample size and penetration rate.
  • LBS data is more representative of the total population, so you’re more likely to get representative results.
  • LBS data has more accurate Demographic and Trip Purpose Attributes. Because we have a month’s worth of data for LBS devices, we can more accurately infer attributes like home and work neighborhoods. This in turn allows for more accurate inference of demographics and trip purposes.
  • LBS data is better normalized. Because StreetLight InSight can analyze the activity of devices over a full month with LBS data, we can associate a device with a home block group and normalize our sample based on home geographies. (That’s what our LBS StreetLight Index does). For example, a device that lives on a block group with 5 StreetLight devices and 100 residents counts for 20, whereas a device that lives on a block group with 10 StreetLight devices and 100 residents count for 10. Home-based geographic normalization also normalizes for many biases in income, race, and other factors.
  • LBS data comes from smartphones, and thus covers all modes. Because the smartphone is presumably always with the owner of the phone, the raw data sets include all modes, for example, train, bike, etc. At this time, we only include LBS trips that are both: 1) longer than 500 meters and 2) longer than 3 minutes. This means we probably exclude some bike and pedestrian trips. We are working on analytics to differentiate modes. When we add full modal support, we will consider adding those back in. Note, this does mean that commercial truck drivers with smartphones in their pockets would be captured in our LBS dataset.
  • LBS data goes back to 2016. Thus, studies that need data before 2016 should consider navigation-GPS or other options.
  • LBS data pings less frequently, on average, than navigation-GPS. Our personal navigation-GPS devices ping several times per minute, and sometimes as frequently as once per second. Because there are less frequent pings for LBS devices, the speed data is not as precise. As a result, quick changes in speed will be averaged out in LBS data.

These are the key characteristics of navigation-GPS data that you should consider:

  • Navigation-GPS data differentiates personal trips from commercial trips. Our navigation-GPS data comes tagged as from a personal vehicle, from a commercial vehicle fleet of Heavy Duty vehicles, or from a commercial vehicle fleet of Medium Duty vehicles. Thus, our navigation-GPS analytics differentiate these types of travel.
  • Personal navigation-GPS Data has a much smaller sample size than LBS. This is the main reason we do not recommend using Personal navigation-GPS data if possible. Commercial navigation-GPS data has a more robust (10-12%) sample size.
  • Navigation-GPS devices ping very frequently. If you need detailed speed or congestion information on a small segment (i.e.: a few hundred meters), navigation-GPS are a better choice than LBS, which pings less frequently. Likewise, if you need detailed travel times or travel time distributions, navigation-GPS data is better because of its high-frequency ping rate.
  • Personal navigation-GPS comes predominantly from connected cars. This means it is far more certain to describe a vehicular mode of travel. If you need vehicular traffic data for very dense, urban environments that have trips of all modes traveling at similar speeds on the same road, then navigation-GPS can be useful to differentiate vehicles from other modes.
  • Personal navigation-GPS Data “rotates” IDs very regularly. This means that we can’t analyze the behavior of one device over time. Thus, inference of key locations and demographics, and trip purpose, is less accurate than for LBS.
  • Navigation-GPS Data goes back to 2014 – If you’re analyzing travel from 2014 and 2015, you should use navigation-GPS data. Please contact us to discuss caveats and best practices for scaling data from those years.

What is “Blended Data”?

For some Project Types, you’ll see “Blended Data” as the only Data Source available. This means that this particular Project Type’s algorithms are already designed to optimize use of all our data sources by blending them together.

Our Recommendations Evolve Over Time

Keep in mind that our analytics, our data sources, and thus our recommendations are always evolving. For example, in early 2018 we made a key change: We processed LBS data to allow for “pass-through” Zones and Road Segments Zones on LBS Data. We developed algorithms to connect the less frequent pings from LBS to infer route information from LBS trips. We tested the results and found them to be more accurate than the results for personal navigation-GPS trips. Therefore, now we recommend LBS for personal travel analyses – not only those that focus on activities instead of trips.

4. Still Can't Choose?

If you still can’t decide which type of data to use, that might be a sign that you need to run two sets of Metrics to get the best results.

To see how that works, watch a recorded webinar featuring Sean McAtee of Cambridge Systematics discussing his work on a travel demand model for the Southeastern Michigan Council of Governments. 

Sean optimized the calibration data for this model by using navigation-GPS data to understand the travel patterns of commercial trucks and LBS data to understand the travel patterns of individuals.

As always, the StreetLight Data Support Team is here to help. Send us an email at to ask for help selecting your data source.


Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request



Please sign in to leave a comment.