Wednesday, September 26, 2012

Completely Unnecessary Statistical Analysis: Phone Directory

by Ryan O'Horo @RyanOHoro
Disclaimer: I am not a statistician.

A particular style of telephone company directory allows callers to “dial by name” to reach a person, after playing the matching contacts’ names.  In the example used here, input must be given as surname + given name with a minimum of three digits using the telephone keypad (e.g. Smith = 764). To cover all possible combinations, you’d calculate 8^3, or 512 combinations. With a directory that allowed repeated searches in the same call, it would take about seven hours of dialing to cover all possible combinations.

Let’s use available data to try and reduce the complexity of the problem while increasing the return on effort - like the giant nerds we are.

The 2000 U.S. Census provided raw data[1] on over 150,000 surnames occurring 100 or more times in the population. This puts the lowest occurrence of a surname in the data at 1 in 2,500,000. The uncounted surnames[2] represent 10.25% of people counted in the 2000 Census. This means our data only cover 89.75%* of the U.S. population, but we can safely assume† that the remaining names closely follow the patterns established in the data we do have available.

In this analysis, the first three characters of each surname in the Census data were converted into a three-digit combination using a telephone keypad conversion function. The resulting data were manipulated using an Excel pivot table to group matching combinations and sum the percentage of occurrence. This resulted in a table that ranked each combination. To facilitate the creation of interactive charts, this data was then imported into a Google Spreadsheet[3].

Results Summary

Unsurprisingly, the distribution of surnames for the patterns is non-uniform, with favorable spikes. Sorting by rank, we find the best pattern - 227 - should return 2% of the surnames for the average U.S. company. What’s more exciting is that we can use a smaller amount of effort to achieve a larger than expected amount of results. Searching by ascending rank to return 50% of the surnames, you only need to search 67 patterns, which is 13% of all possible combinations. To return 90% of the surnames you only need to search 241 patterns, which is 47% of all possible combinations. Some milestones are listed in the chart below.

The following chart shows the curvilinear relationship of the expected returns versus the  effort expended.

Test Case

A test case was performed against an actual U.S. company phone directory, with a medium-sized population that happened to be highly biased to Polish surnames. Approximately 120 names were “randomly” selected based on a known list of employees and the patterns for each were searched. In spite of the bias, the test case correlated well with the expected results.

The highest number of surnames (6) was returned by pattern 627 (3rd Rank), the second highest number of surnames (5) was returned by pattern 227 (1st Rank) and the fourth highest number of surnames (3) was returned by pattern 726 (5th Rank). These three data points average to estimate a total population of 300, which is close to the expected size of the company.

The U.S. Census includes racial data, which may be helpful in tailoring to certain populations, but surnames by state would be more helpful, which do not appear to be available. A geographic breakdown could improve results in the test case.

Notable Facts

·         Three patterns do not appear in this data: 577, 957, 959.

·         Sorted by rank, the last 10% of surnames require 53% of the effort.

·         Surname data from the 2010 Census was not compiled and is not available.

·         Unlike the U.S., Canada has a large population of 2-letter surnames[4].

·         Canada’s government does not release surname data.

Get The Full List

Thanks to Nick Roberts of Foundstone for supplying a Canadian point of view on the subject.


*  Two-letter surnames were excluded. This reduces the coverage of the analysis by 0.25% to 89.50% of the total population, a negligible change. Since entering these surnames would require the first letter of the given name, these should be analyzed separately for the distribution of given names, with some consideration to the biases of ethnicity. The U.S. Census does not consider surnames with one character valid.

 Some references in this document extrapolate the Census data to include 100% of the population for clarity. The spreadsheet[4] available lists percentages of both the sample data and the population as a whole for accuracy.

Tuesday, September 11, 2012

Malware Doesn't Care About Your Disclosure Policy, But You Better Have One Anyway

by Eireann Leverett  @blackswanburst

All over the world, things are changing in ICS security—we are now in the spotlight and the only way forward is, well, forward. Consequently, I'm doing more reading than ever to keep up with technical issues, global incidents, and frameworks and policies that will ensure the security of our future.

From a security researcher's perspective, one exciting development is that .gov is starting to understand the need for disclosure in some cases. They have found that by giving companies lead time to implement fixes, they often get stonewalled for months or years. Yes, it sometimes takes years to fix specific ICS security issues, but that is no excuse for failing to contact the researcher and ICS-CERT with continually-updated timelines. This is well reflected in the document we are about to review.

The Common Industrial Control System Vulnerability Disclosure Framework was published a bit before BlackHat/Defcon/BSideLV, and I've just had some time to read it. The ICSJWG put this together and I would say that overall it is very informative.

For example, let's start with the final (and most blogged about) quote of the Executive Summary:

"Inconsistent disclosure policies have also contributed to a public perception of disorganization within the ICS security community."

I can't disagree with that—failure to have a policy already has contributed to many late nights for engineers.

On Page 7, we see a clarification of vulnerabilities found during customer audits that is commendable:
"Under standard audit contracts, the results of the audit are confidential to the organization customer and any party that they choose to share those results with. This allows for information to be passed back to the vendor without violating the terms of the audit. The standard contract will also prevent the auditing company from being able to disclose any findings publically. It is important to note however, that it is not required for a customer to pass audit results on to a vendor unless explicitly noted in their contract or software license agreement."

Is there a vendor who explicitly asks customers to report vulnerabilities in their license agreements? Why/why not?

On Page 9, Section 5 we find a dangerous claim, one that I would like to challenge as firmly and fairly as I can:
"Not disclosing an issue is not discussed; however it remains an option and may be appropriate in some scenarios."

Very, well. I'm a reasonable guy whose even known to support responsible disclosure despite the fact it puts hand-cuffs on only the good guys. Being such a reasonable guy, I'm going to pretend I can accept the idea that a company selling industrial systems or devices might have a genuine reason to not disclose a security flaw to its customers. In the spirit of such a debate, I invite any vendor to comment on this blog post with a hypothetical scenario in which this is justified.

Hypothetically speaking: When is it appropriate to withhold vulnerabilities and not disclose them to your ICS customers?

While we're at it, we also see the age-old disclosure always increases risk trope again, here:
"Public Disclosure does increase risk to customers, as any information disclosed about the vulnerability is available to malicious individuals as well as to legitimate customers. If a vulnerability is disclosed publically prior to a fix being made available, or prior to an available fix being deployed to all customers, malicious parties may be able to use that information to impact customer operations."
Since I was bold enough to challenge all vendors to answer my question about when it is appropriate to remain silent, it's only fair to tackle a thorny issue from the document myself. Imagine you have a serious security flaw without a fix. The argument goes that you shouldn't disclose it publicly since that would increase the risk. However, what if the exploit were tightly constrained and detectable in 100% of cases? It seems clear that in this case, public disclosure gives the best chance for your customers to DETECT exploitation as opposed to waiting for the fix. Wouldn't that DECREASE risk? Unfortunately, until you can measure both risk and the occurrence of 0-day vulnerabilities in the wild RELIABLY, this is all just conjecture.

There exists a common misconception in vulnerability management that only the vendor can protect the customer by fixing an issue, and that public disclosure always increases risk. With public disclosure, you widen the circle of critical and innovative eyes, and a third party might be able to mitigate where the vendor cannot—for example, by using one of their own proprietary technologies.

Say, for example, that a couple of ICS vendors had partnered with an Intrusion Detection and Prevention system company that is a known defender of industrial systems. They could then focus their early vulnerability analysis efforts on detecting and mitigating exploits on the wire reliably before they're even fixed. This would reduce the number of days after zero the exploit can't be detected and, to my thinking, that reduces the risk. I'm disappointed that—in the post-Stuxnet era—we continue to have ICS disclosure debates because the malware authors ultimately don't even care. I can't help but notice that recent ICS malware authors weren't consulted about their "disclosure policies" and also didn't choose to offer them.

As much as I love a lively debate, I wanted to commend the ICSJWG for having the patience to explain disclosure when the rest of us get tired.