All About our Data Sources

StreetLight Data offers Metrics derived from two different types of Big Data resources: Location-Based Services (LBS) data and Navigation-GPS data. Some Types and Modes of Travel will automatically select a data source for you, but others require you to choose your data source. This guide will walk you through our data sources so you can make informed decisions when deciding which data source to use for your analysis. 

What are our key data sources? 

Navigation-GPS Data vs. Location-Based Services Data: A Detailed Breakdown

In some cases, you have a choice about which data source to use. Learn more about what determines if a choice is available or not. Each data source has unique benefits and limitations that guide our recommendations.

These are the key characteristics of LBS data that you should consider:

  • LBS data has a larger sample size, so it’s ideal if getting a highly comprehensive sample is important to you. Keep in mind that our sample size varies regionally and temporally, so each Project you run will have a unique sample size and penetration rate.
  • LBS data is more representative of the total population, so you’re more likely to get representative results.
  • LBS data has more accurate Demographic and Trip Purpose Attributes. Because we have a month’s worth of data for LBS devices, we can more accurately infer attributes like home and work neighborhoods. This, in turn, allows for more accurate inference of demographics and trip purposes.
  • LBS data is better normalized. Because StreetLight InSight can analyze the activity of devices over a full month with LBS data, we can associate a device with a home block group and normalize our sample based on home geographies. (That’s what our LBS StreetLight Index does).For example, a device that lives on a block group with 5 StreetLight devices and 100 residents counts for 20, whereas a device that lives on a block group with 10 StreetLight devices and 100 residents count for 10. Home-based geographic normalization also normalizes for many biases in income, race, and other factors. We do have some caveats when comparing indexes, learn more about those caveats. 
  • LBS data comes from smartphones, and thus covers all modes. Because the smartphone is presumably always with the owner of the phone, the raw data sets include all modes, for example, train, bike, etc. At this time, we only include LBS trips that are both: 1) longer than 500 meters and 2) longer than 3 minutes. Note, this does mean that commercial truck drivers with smartphones in their pockets would be captured in our LBS dataset. When using "All Mode" for LBS data we exclude some bicycle and pedestrian trips due to our aforementioned trip definition. We have added modes for the bicycle and pedestrian trips using the LBS data. We are working on analytics to differentiate even more modes. When we add full modal support, we will consider adding those back in.
  • LBS data provides bicycle and pedestrian data. Because our LBS data captures raw data sets for all modes, we are able to differentiate between "All Modes" trips (which includes all modes of trips), bicycle trips, and pedestrian trips. We use our LBS data and machine learning processes to detect modes of devices as they ping. Then we string together the pings to come up with a probable trip mode. Learn more about our bike/ped methodology. 
  • LBS data goes back to 2016. Thus, studies that need data before 2016 should consider Navigation-GPS or other options.
  • LBS data pings less frequently, on average than Navigation-GPS. Our personal Navigation-GPS devices ping several times per minute, and sometimes as frequently as once per second. Because there are less frequent pings for LBS devices, the speed data is not as precise. As a result, quick changes in speed will be averaged out in LBS data.

These are the key characteristics of Navigation-GPS data that you should consider:

  • Navigation-GPS data differentiates personal trips from commercial trips. Our Navigation-GPS data comes tagged as from a personal vehicle, from a commercial vehicle fleet of Heavy Duty vehicles, or from a commercial vehicle fleet of Medium Duty vehicles. Thus, our Navigation-GPS analytics differentiate these types of travel.
  • Personal Navigation-GPS Data has a much smaller sample size than LBS. This is the main reason we do not recommend using Personal Navigation-GPS data if possible. Commercial Navigation-GPS data has a more robust (10-12%) sample size.
  • Navigation-GPS devices ping very frequently. If you need detailed speed or congestion information on a small segment (i.e.: a few hundred meters), Navigation-GPS is a better choice than LBS, which pings less frequently. Likewise, if you need detailed travel times or travel time distributions, Navigation-GPS data is better because of its high-frequency ping rate.
  • Personal Navigation-GPS comes predominantly from connected cars. This means it is far more certain to describe a vehicular mode of travel. If you need vehicular traffic data for very dense, urban environments that have trips of all modes traveling at similar speeds on the same road, then Navigation-GPS can be useful to differentiate vehicles from other modes.
  • Personal Navigation-GPS Data “rotates” IDs very regularly. This means that we can’t analyze the behavior of one device over time. Thus, inference of key locations and demographics, and trip purpose is less accurate than for LBS.
  • Navigation-GPS Data goes back to 2014 – If you’re analyzing travel from 2014 and 2015, you should use Navigation-GPS data. Please contact us to discuss caveats and best practices for scaling data from those years.

What is “Blended Data”?

For some Project Types, you’ll see “Blended Data” as the only Data Source available. This means that this particular Project Type’s algorithms are already designed to optimize the use of all our data sources by blending them together. Because this Data Source is based off an algorithm it cannot be altered or compared to other Data Sources. 

Can StreetLight's recommendations change?

Keep in mind that our analytics, our data sources, and thus our recommendations are always evolving. For example, in early 2019 we made a key change: we added new data partners and sources to our LBS with Pass-through data source. This change, generally, doubled our sample size and even quadrupled our sample size when directly comparing January 2018 and January 2019. This change in our sample size furthered our recommendation that one should use LBS with Pass-through when analyzing personal trips. 

What's your quick answer for what Data Source to use and when?

Answer: If you're given a choice, Navigation-GPS for Commercial Travel and LBS with Pass-through for Personal Travel

In a nutshell, if you need to analyze commercial truck behavior, you should always use Navigation-GPS data. If you want to analyze personal travel, you should almost always use LBS with Pass-through. However, there are a few small exceptions, as discussed in the detailed explanation below.

We try to keep StreetLight InSight as flexible as possible and to allow maximum flexibility with our data sources. While StreetLight InSight will automatically populate the data source field based on the other criteria you select when creating your Project, you can override this selection and choose your data source for most Projects.*

Keep in mind that if you want to run a Project twice - for example, once with Navigation-GPS for commercial vehicle analytics and once LBS with Pass-through for personal travel, that’s usually fine with us. However, StL Index values provided in your Project Metrics will not be directly comparable because our normalization algorithms for Navigation-GPS and LBS are different.

*You cannot change Project set up for several project types including Estimated AADT Values, Traffic Diagnostics. and Visitor Home and Work Analysis.

Still can't choose which Data Source to use?

If you still can’t decide which type of data to use, that might be a sign that you need to run two sets of Metrics to get the best results.

To see how that works, watch a recorded webinar featuring Sean McAtee of Cambridge Systematics discussing his work on a travel demand model for the Southeastern Michigan Council of Governments. 

Sean optimized the calibration data for this model by using Navigation-GPS data to understand the travel patterns of commercial trucks and LBS data to understand the travel patterns of individuals.

As always, the StreetLight Data Support Team is here to help. Send us an email at support@streetlightdata.com to ask for help selecting your data source.

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

0 comments

Article is closed for comments.