« Turf II | Main | Boycott the Gang of Three »

12 May 2006

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

zanzibar

Clearly the domestic call records that the telcos provided were to enable pattern recognition - the preceding step in deciding the warrantless wiretaps. It is easy to relate telephone numbers to people and other related personal information.

Data mining which is uncovering patterns is something that WalMart uses extensively to correlate unrelated variables in determining causality in buying patterns. Mining tens of millions of Americans call records and email is surely compute intensive and I wonder how far data mining software has progressed in the intelligence realm. In the commercial space data mining software is still rather primitive.

The legal issues around the disclosure of call records by the telcos seem quite murky as the USA Today article pointed out Qwest lawyers wanted either FISA court approval or a more definitive legal opinion from the DoJ which the spooks were unwilling to get implying it would not have passed muster. Obviously AT&T, Verizon and Bellsouth lawyers did not see any issues.

At the end of the day what I'd like to understand is how effective have the traffic analysis and warantless wiretapping been in rolling up networks. What are the cost-benefits? Is it worth the erosion of constitutional liberties? Is the 4th amendment an anachronism in the gwot? If domestic call records of tens of millions of Americans are OK for the government to acquire, how about medical and financial records? Is there a line that should not be crossed?

jonst

Its the, what I call, crazy "six degrees of separation" theory of data analysis

http://www.govexec.com/dailyfed/1205/120705nj1.htm

jonst

oh, and this is just the kind of stuff Brent Wilkes et al were bribing DC folks to gain contracts for. They are outsourcing the data collecting operations. Connect the dots folks, connect the dots.

ckrantz

I wonder about the effectivness of the datamining. All hardcore terrorist groups should by now be aware that any public electronic communication can be tracked. It seems far more useful for intelligence gathering on a domestic population using open communications.

Quoting the wikipedia.
'The usefulness of traffic analysis can be reduced if traffic is faked or if traffic cannot be intercepted'

kky

" Obviously AT&T, Verizon and Bellsouth lawyers did not see any issues." I would say they knew there were issues, but balanced them against a War ON Terror defense and pleasing those who control the rules they play by.

searp

So now we have our own intelligence service surveilling... us.

I'd like to see the classification guide on this one.

"This program is classified because if the targets, millions of American citizens, ever found out we were doing this, we'd be in big trouble"

Seriously, it would be very interesting to know the justification for classifying this program. No sources and methods to protect here.

CJ

I am sure terrorists assume that the US government is aggressively fishing in the telecommunications streams. Some of the intelligence types can undoubtedly comment on the methods they use to get around or through the government's net. These days, only the American public is naive enough to believe the constitution and their privacy matters. Sad.

Happy Jack

"I wonder how far data mining software has progressed"

Just a guess, but I would say pretty far.

Curious


NSA == Now Spying on Americans

Sonoma

Apparently, it's not simply phone records, but e-mails, as well. From today's Washington Post:

"One government lawyer who has participated in negotiations with telecommunications providers said the Bush administration has argued that a company can turn over its entire database of customer records -- and even the stored content of calls and e-mails -- because customers "have consented to that" when they establish accounts".

MarcLord

At its core, this type of surveillance requires excellent audio mining capability to return accurate text values. You must analyze the signal you captured and figure out what the speakers actually said. This task is decidedly non-trivial, and quickly gets you into prediction and cognition.

Mathematically, you can use the same tools to analyze patterns (Bayesian Networks, Hidden Markov Models) and cluster them statistically for speech as you would for, say, air travel. You just need the data. Preferably all of it. Once you've got enough data assembled and have defined domain outcomes, you can figure out all the paths that led to, or theoretically lead to, those outcomes. Then you've built what's called a Finite State Machine...a method of predicting what someone wants in a topical domain, often before they know it themselves.

Then you flag the pathways that require human analysis, pass it to trained agents, and they listen and cross-check other databases to learn all about what someone wants. This is what has been built. It cannot be unbuilt.

Curious

Posted by: MarcLord | 12 May 2006 at 03:03 PM

but if you have several type of different data you don't have to be precise. (eg. it sort of like google image search. It really doesn't know what the image is really well. but it surmise from file name, text around the image, etc etc.

so if NSA collects emails, bank account, travel/ticket slips, then actually listening to the audio phone conversation won't come until the computer piece the whole thing together.

But the worst thing is. All those collected data by the NSA, will not die. They last forever. 5-10 years from now, if somebody is going to do digital vandetta on you. They have stuff.

Imagine a scenario where via health record, mother/father, somebody will track your relative, and create a engineered virus to bring you down.

Also, this sort of data collection is the ultimate national stupidity. if those database gets hacked, somebody can bring down the entire nation. (say, financial panick, voting records, tax records, etc, etc)

It's been done before in other country.

telecompro

The Call Detail Record (CDR) is the info captured by Telcos in order to generate their bills.
No CDR, no bill.
No bill, no revenue.
Every call generates a CDR.

Some of the data captured include:
1) Calling number
2) Called number
3) Call duration
4) 3-way calling (if used)
5) Call date and time

If millions of CDRs are given to the NSA and kept in a large database, the NSA can look back at any time to see who you called at any particular day and time.

A CIA manager was recently fired for supposedly leaking classified info to Dana Priest of the WaPo.

My belief, after hearing that the Telco's provides these records to the NSA, is that once Priest wrote her story, her CDRs were examined and the telephone number of the CIA person showed up.
Ergo, she was the source of the leak.

Similarly, the CDRs of the USAToday reporter who broke this story is probably being examined right now to see who (s)he called.

Note to journalists, politicians and other interested parties: they can look back at any time and see every number you called.
The content of the call is long lost, but they know who you call and when.

Beware.

linda

fwiw, this comment is taken from a post at firedoglake:

Surveillance American Style

It’s not just traffic analysis they’re doing. They’re trapping a massive amount of conversational audio and then turning it into text. This must be done in order to tag the words spoken in conversations and develop a string of probablistic associations with later speaker actions. Note that none of the information here is from classified sources; I work on the automating technologies (recognition and synthesis) and with some of the best engineers for call processing and audio mining. The NSA uses the same techniques but has oodles more resources. Here’s how it goes:

At its core, this type of surveillance requires excellent audio mining capability to return accurate text values. To mine the audio, you must analyze the signal you captured and figure out what the speakers actually said. This task is decidedly non-trivial, and quickly gets you into prediction and will soon become entangled with cognition and measured brain activity.

Mathematically, you can use the same tools to analyze patterns (Bayesian Networks, Hidden Markov Models) and cluster them statistically for speech as you would for, say, air travel. You just need the data. Preferably all of it. Once you’ve got enough data assembled and have defined domain outcomes, you can figure out all the paths that led to, or theoretically lead to, those outcomes. Then you’ve built what’s called a Finite State Machine…a method of predicting what someone wants in a topical domain, often before they know it themselves.

Then you flag the pathways that require human analysis, pass it to trained agents, and they listen and cross-check other databases to learn all about what someone wants. This is what has been built. It cannot be unbuilt. When you call up Amazon someday, they’ll be calling the same kind of huge associations database a relevancy engine for catering to your latent desires.

linda

oh hell, i just now see that the author of the fdl post has already visited. never mind.

well, in other news, russell tice, a former nsa employee is scheduled to testify next week saying that it's much worse than anyone imagines. let's hope he is in a secure location.

http://thinkprogress.org/2006/05/12/more-unlawful-activity/

avedis

I am with telecompro, though he pulls up a little short.

My dept. uses datamining techniques and software to perform a sort of traffic analysis (as well as predictive modeling, cluster analysis, etc) in the healthcare insurance sector. In fact. we use the same software - SAS - that the NSA probably uses. It's the best on the market and I know for a fact that the IRS, Medicare and other Gov't agencies rely on SAS for datamining.

The capabilities of this software are immense. It is fully capable - in the right hands - of mining text (converted from audio?).

Furthermore, the limited application that Bush keeps asserting is the scope of the project, makes no sense. It would neuter the purpose. You wouldn't get any actionable results.

If you're going to datamine phone calls, you're going to write programs that assign probabilities for ethnicity, you're going to need names, you're going to need locations, you're going to do sampling of audio text so you can test your algorithms.

Why stop there? Once you have a name and have surpassed thresholds for other characteristics it makes sense to datamine emails, credit card and other financial information, etc.

Notice (above) that I referenced "right hands". There are few people that would quailify. Even with the right hands it is very easy to make a mistake. Individuals in the datasets get assigned to the wrong cluster/group.

Think about a Google search; which is a form of datamining. Google is pretty good, but you still get some irrelevant hits.

The only way you can know that the hit is irrevelant is to dig in further, look at the context of the key words, etc.

searp

Avedis: I agree absolutely on your point that the program cannot be as described. It was classified precisely to avoid having to disclose what they are doing to the public. If they are doing what they claim, there is absolutely no reason to avoid disclosure - recent polling confirms that this is not a political problem.

Call records are only the tip of the iceberg.

avedis

Even with if only call records were involved, What Bush is asserting to the public is an insultingly stupid misdirection.

If sheik Yerbuti - a suspected terrorist living outside the US - makes calls to a certain Joe Greencard within the US and those calls become the subject of the NSA program, then certain subsequent steps must occur because terrorists are likely to be working in cells.

Who does Joe G. associate with? Joe G.'s phone records are a very logical place to start looking into the answer for that question.

Everyone that Joe has contacted via phone (and, again, realistically, email, etc) will come under scrutiny. They will be investigated in the attempt to learn more about Joe's - and ultimately Yerbuti's - world.

If Joe works for a small business, then his employer will be investigated, if he has called a baby sitter twice a week, then the baby sitter will be investigated.................


MarcLord

avedis, telecompro,

thanks guys, and yes exactly.

I emphasized the effort to build acoustic and linguistic models because of its special privacy implications. It was necessary to use the actual surveilled conversations of US citizens whose only criminal act was to talk on a phone. Just as was done with credit card and travel records, once the models are built they just need to be updated. Only conversations are protected by privacy laws and FCC regs which are very clear, and still on the books.

As for the data-mining tool, they can no doubt drill down into anybody at any time, but unlike SAS they must be able to eavesdrop on conversations at will and record them in their entirety. Thus there are a massive number of conversations undertaken between innocent Americans have been stored on optical disk for no good reason. And no government could resist using such a tool to silence and punish political enemies, with Mary McCarthy being but one minor case.

sheik yerbuti. hehe.

Curious

Similarly, the CDRs of the USAToday reporter who broke this story is probably being examined right now to see who (s)he called.

Posted by: telecompro | 12 May 2006 at 10:10 PM

Of course they friggin better use non wireless/network connected mean of communication.

It'll be pretty stupid if somebody leaking the story does that. (but than again...

we can check if the will be further leak. From the continuous/ perfectly timed leaked. It is obvious the insider/reporters are all learning how to move around the network communication.

And the NSA can't do jack, short of raiding the press office or bugging the offices. (but then reporter will go somewhere else)

and once that bug is found, scandal will explode even bigger.

basically, the whole thing about to explode big time.

Rider

Can it find terrorists? Has it ever? I'm sure it has retrospectively. My question is about the predictive value of data mining. We know that this system is a false lead generator of tremendous power. Does it work?

Curious

Posted by: Rider | 16 May 2006 at 07:04 AM

They are not going to find Osama that way. I can tell you that.

By now even a 13 yrs old script kiddie knows they are being watched. (hell. these mofos are following me around the internet and leaving trace all over the logs. This include some weird Israeli database(definitly automatic sniffers) and dumbasses from Iranian foreign ministries)

I mean, what the hey? If they are tracing me around just because I am making coment in some blogs, they gotta trace WAY WAY more importants and bigger targets...

I hope they all die killing each other, and finally leave the world in peace.

The comments to this entry are closed.

My Photo

January 2020

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  
Blog powered by Typepad