MaS is about computer security, malware and spam issues in general.


CfC: Fifth International Summer School CfP

[PDF version]
Call for Contributions

Fifth International Summer School
organised jointly by the PrimeLife EU project
in cooperation with the IFIP WG 9.2, 9.6/11.7 11.4, 11.6.
Privacy and Identity Management for Life
(PrimeLife/IFIP Summer School 2009)
to be held in Nice, France, 7th – 11th September 2009

New Internet developments pose greater and greater privacy dilemmas. In the Information Society, the need for individuals to protect their autonomy and retain control over their personal information is becoming more and more important. Today, information and communication technologies – and the people responsible for making decisions about them, designing, and implementing them – scarcely consider those requirements, thereby potentially putting individuals’ privacy at risk. The increasingly collaborative character of the Internet enables anyone to compose services and contribute and distribute information. It may become hard for individuals to manage and control information that concerns them and particularly how to eliminate outdated or unwanted personal information, thus leaving personal histories exposed permanently. These activities raise substantial new challenges for personal privacy at the technical, social, ethical, regulatory, and legal levels:

· How can privacy in emerging Internet applications such as collaborative scenarios and virtual communities be protected?

· What frameworks and technical tools could be utilised to maintain life-long privacy?

The theme of this Summer School to be held in September 2009 and co-organised by the PrimeLife EU project and the International Federation for Information Processing (IFIP) will be on privacy and identity management for emerging Internet applications throughout a person’s life.

Both IFIP and PrimeLife take a holistic approach to technology and support interdisciplinary exchange. Participants’ contributions that combine technical, legal, regulatory, socio-economic, ethical, philosophical, or psychological perspectives are especially welcome.

Contributions from students who are at the stages of preparing either masters’ or doctoral theses qualifications will be especially welcomed. The school is interactive in character, and is composed of both keynote lectures and seminars, tutorials and workshops with PhD student presentations. The principle is to encourage young academic and industry entrants to the privacy and identity management world to share their own ideas and to build up a collegial relationship with others. Students that actively participate, in particular those who present a paper, can receive a course certificate which awards 3 ECTS at the PhD level. The certificate can certify the topic of the contributed paper to demonstrate its relation or non-relation to the student’s PhD thesis.

Topics of interest include but are not limited to:
- privacy and Identity management (application scenarios/use cases, technologies, infrastructures, usability aspects)
- privacy-enhancing technologies
- anonymity and pseudonymity
- transparency-enhancing tools
- privacy and trust policies
- privacy-aware web service composition
- privacy metrics
- trust management and reputation systems
- assurance evaluation and control
- privacy in complex emerging real-life scenarios
- the use of privacy-enhancing mechanisms in various application areas that are often life-long in character such as eLearning, eHealth, or LBS
- life-long privacy challenges and sustainable privacy and identity management
- privacy issues relating to social networks, social network analysis, profiling
- privacy aspects of RFID and tracking technologies, biometrics
- surveillance, data retention, availability and other legal-regulatory aspects,
- socio-economic aspects of privacy and identity management, and
- impact on social exclusion/digital divide/cultural aspects.

Contributions will be selected based on an extended abstract review by the Summer School Programme Committee. Accepted short versions of papers will be made available to all participants in the Summer School Pre-Proceedings. After the Summer School, authors will have the opportunity to submit their final full papers (which will address questions and aspects raised during the Summer School) for publication in the Summer School Proceedings published by the official IFIP publisher. The papers to be included in the Final Proceedings published by Springer (or the official IFIP publisher) will again be reviewed and selected by the Summer School Programme Committee.

Summer School Website:

The submission address for extended abstracts (2-4 pages in length) will be accessible via the Summer School Website.

Submission deadline: May 14, 2009
Notification of acceptance: June 18, 2009
Short paper (up to 6 pages) for the Pre-Proceedings: August 11, 2009

General Chair:
Michele Bezzi (SAP Research/ France)

Programme Committee Co-Chairs:
Penny Duquenoy (Middlesex University/ UK, IFIP WG 9.2 chair)
Simone Fischer-Hübner (Karlstad University/ Sweden, IFIP WG11.6 vice chair)
Marit Hansen (Independent Centre for Privacy Protection Schleswig-Holstein, Kiel/ Germany)

Programme Committee:
Jan Camenisch (IBM Research/ Switzerland, IFIP WP 11.4 chair)
Mark Gasson (University of Reading/ UK)
Hans Hedbom (Karlstad University/ Sweden)
Tom Keenan (University of Calgary/ Canada)
Dogan Kesdogan (Siegen University/ Germany)
Kai Kimppa (University of Turku/ Finland)
Eleni Kosta (KU Leuven/ Belgium)
Elisabeth de Leeuw (Ordina/ Netherlands, IFIP WG 11.6 chair)
Marc van Lieshout (Joint Research Centre/ Spain)
Javier Lopez (University of Malaga/ Spain)
Vaclav Matyas (Masaryk University, Brno/ Czech Republic)
Martin Meints (Independent Centre for Privacy Protection Schleswig-Holstein, Kiel/ Germany)
Jean-Christophe Pazzaglia (SAP Research/France)
Uli Pinsdorf (Europäisches Microsoft Innovations Center GmbH (EMIC)/ Germany)
Andreas Pfitzmann (TU Dresden/ Germany)
Charles Raab (University of Edinburgh/ UK)
Kai Rannenberg (Goethe University Frankfurt/ Germany, IFIP TC11 chair)
Dieter Sommer (IBM Research/ Switzerland)
Sandra Steinbrecher (TU Dresden/ Germany)
Morton Swimmer (John Jay College of Criminal Justice, CUNY/ USA)
Jozef Vyskoc (VaF/ Slovakia)
Rigo Wenning (W3C/ France)
Diane Whitehouse (The Castlegate Consultancy/ UK)

Organising Committee Chair:
Jean-Christophe Pazzaglia (SAP Research/ France)


Announcing the local New York chapter of the Heart Project

Although we haven't formally been accepted, I wanted to announce the formation of the local chapter of the Heart Project, which I've called "I heart New York". The idea is to participate in the development of an RDF store for Hadoop/Hbase. The main project is based mainly in Korea, which is just a bit too far for most people to travel, but there is quite a bit of interest in very large RDF databases here, so it seemed a good idea to have a local group. It will be attached to the NYC Semantic Web meetup group that Marco Neumann organizes, which is already one of the world's largest semantic web interest groups.
So, why am I interested, being the security geek that I am? Well, RDF and sematic web technology interests me in two ways. First of all, there is it's use in Data Centric Security. However, the other angle that I have is the encoding, exchange and reasoning over security relevant data expressed in RDF, or at the very least, using constrained (and well-defined) vocabularies. However, while looking at the amount of data that we at Trend Micro collect, I realized that no current system can handle it all. Furthermore, since we are working with a Hadoop infrastructure, it would be appropriate to leverage it. This led me to Heart.
If you are interested in the Heart project I'd encourage you to join in and if you are a New York local, then join our chapter, too!


Metrocards and PII

So, I guess I'm not surprised, but Metrocards do contain ID information allowing the user to be tracked, see the New York Times article on a recent case. If you bought your card with a credit or debit card, then you can be identified, too.
I guess this has to be considered a normal infraction of our privacy nowadays -- along with credit cards, social security numbers, EZ-Pass fobs, ...


[picture by Darny, used under a Creative Common's license.]


A few FCW Tournament photos

Hi all,

It's going to take a while to sift through all the photos I took at
the tournament and this is going to be a busy week for me. However,
I've posted a few photos from the trophy receiving ceremony for now.
It was already night when the Blue team finished their game (in the
dark) so I only have one photo available of them waiting for the
trophies. There are a few more photos of the White team and a group
photo with the trophies.


Cheers, Morton


I was reading a New York Times article titled "Agency’s ’04 Rule Let Banks Pile Up New Debt". It is a pretty damning article on the SEC and describes a quiet decision made by them to allow investment banks to take on more debt than previously allowed under the assumption that the banks were able to  manage their risk better with their newfangled computer models. This allowed Bear Stearns (R.I.P.) to raise it's leverage ratio to 33:1, which seems extraordinarily high. Anyway, while reading it I stumbled over this paragraph:
A lone dissenter — a software consultant and expert on risk management — weighed in from Indiana with a two-page letter to warn the commission that the move was a grave mistake. He never heard back from Washington.
The software consultant was Leonard D. Bole, of Valparaiso, Ind. and he was expressing doubts that computer models could protect companies seeing that they had failed to do so in the collapse of a hedge-fund in 1998 and the market plunge in 1987. While I have my doubts that any computer model can calculate risk well enough and certainly increasing allowed leverage ratios seems just plain daft, I think the current credit crisis is now just down to trust. Or the lack of it.
So, if it is a trust problem, how would a computer scientist approach the problem? First of all, I need to point out that trust is really a human issue, so there is a limit to how much computers can help, just as I doubt we can model risk. However, one of the problems is that there is a certain degree of mortgages that are of too high risk, but banks don't know what their exact exposure is, let alone that of their competitors. The result is that no one trusts each other and the capital market has suffered a form of seizure or heart attack.
A couple of years I was leading a project exploring Data Centric Security and as a part of my research I looked into provenance. We never had time to weave it into the model properly, but identified it as an important aspect that eventually needed to be included. But, wait. What is provenance?
Take paper. Paper documents have great provenance. You fill out a form, hand it in. It gets handled, gets coffee stains over it, stapled to other documents, stamped, filed, refiled, etc. By examining a paper document you get a feeling for where that document has been and what it went through. That is provenance. 
Unfortunately, electronic documents don't have provenance out of the box. Luckily, there has been some research into how provenance can be added. The project I was exposed to at IBM Research was the EU Provenance Project that was a part of the European Commission's Sixth Framework Programme, bless their cotton socks. Their proposed architecture, if I remember correctly, was to place hooks in document processing which record document use (CRUD operations: create, read, update, delete). Though I'm not sure if that is the way I would have done it, it certainly work work unless someone cheated or didn't implement the hooks, though I assume that would be uncovered the next time the provenance recording system saw the document. 
How would provenance help in the credit crisis? If we just isolate the problem of sub-prime mortgages (and my brother, who knows much more about the financial industry assures me that there are a whole pile of other problems) it does look like a provenance problem to me. From my perspective as an outsider, what seemed to be happening was that these sub-prime mortgages were being sold, repackaged with other debt, sold again and so on. In the end, the last one in the chain didn't know what he/she was actually getting. The lack of provenance of these aggregate debt packages meant it wasn't possible to sufficiently well calculate what the risk was (in itself a dubious thing, but made even more difficult in this case.)  
Remember that all financial instruments is really just a document of sorts that we attach a value to. The document has no intrinsic value. Take currency: The dollar bill has no real value. You can't eat it. It doesn't produce a lot of energy when burned. However, we place a certain amount of trust in it as the intricate design and the type of paper tells us that it comes from a trusted source: in this case the US Treasury. The provenance of this bill allows us to accept that the risk is low that the extrinsic value is not one dollar, US. 
When aggregating debt from multiple sources you need to collect the provenance of all the included debt documents. This allows you to better estimate the risk associated with the aggregate debt and also find inconsistencies that I really really hope dont exist like circular provenance (which would be similar to a Ponzi scheme.) It also would allow the banks to identify the bad parts of the debts and calculate their exposure, which is something that they don't seem to be able to do at the moment. If they could, they would probably find that the bad debt they own is not as bad as it could be and there would be less uncertainty. Amongst other things, it is the uncertainty about the exposure to bad debt that has resulted in the credit crisis. 
While not all the problems that banks are facing can be solved by computer scientists or mathematicians, and you can argue that we have been instrumental in getting us into this mess, provenance standards for financial documents would go a long way to alleviating the problems we have at the moment.


I survived VB 2008

The Virus Bulletin Conference is probably the most important anti-malware conference there is. It is also the oldest surviving. I have been attending only since 1995 as it was just too expensive as a student. 
This year, it was in Ottawa, Canada's capital. The conference switches sides of the Atlantic every year, but since 2001, it is not possible to hold it in USA because some delegates can not or will not travel to the US. That said, Canada is a great place to go to, though VB is starting to run out of likely venues. 
There were no real eye-openers in the presentations I saw, but there was a constant flow of useful snippets of information. Luckily, my talk was the first after the keynote, so I could enjoy the rest of the conference. 
The real value of this conference, as with nearly every one, is the networking one does. I had quite a few hallway chats with delegates and speakers, and I've come to realize that these chats are what makes the industry function. It builds trust in an industry where misplaced trust could be dangerous. 
What I really noticed this year was that photography seems to be a very popular hobby. I've put my own photos on pbase, but thought it might be fun to start a flickr vphoto group for the amateurs in the anti-virus industry. (I actually prefer pbase for more serious work, but more people are already on Flickr.)
So, not after four 18 hour days and too much food and alcohol, I'm in rehab mode. It was fun, but I'm glad there is only one VB conference a year.

I confess to world domination

Graham Cluley, of Sophos, filmed various anti-virus researchers on a variety of silly subjects at the Virus Bulletin 2008 conference. I was one of them. I'll have to confess that I was one of the few to see the questions beforehand, so I knew what was coming. In the spirit of things, I decided to be totally silly about it. Enjoy. Or cringe. Your choice.


We may be losing

Yesterday (26. Sept. 2008), I was on a panel at the Polytechnic institute of NYU to discuss targeted malware. I chose the title above for my introductory talk to be provocative, but, in truth, the security is in reactive mode and is slow even doing that. I wanted to outline what I thought are the problems in a nutshell and what the intended interdisciplinary audience could be thinking of doing about them. Because I was trying to stick to my alloted 5 minutes, I didn't get through all the material, so here is what I was trying to communicate:


The tendency to overcomplicate design

We overcomplicate systems in two ways: (1) in engineering we seem to want to use the hammer that we know and treat everything as a nail. We are also very much caught up in a legacy mode to thinking. OSes like OS390, Windows, Linux are examples. The same thing goes for application frameworks. We try to shovel every problem into these without thought of if this makes sense. This results in overcomplicated designs that are impossible to understand and audit. (2) The systems are also very hard to actually understand for the user. We design the UI in a way that hides the internals and is alien to how people think the system is going about its business. The fact that a spammer can abuse the email sender field is a total HCI failure.
Furthermore, we never seem to consider security and privacy in the initial design. “Let’s get this out first, and see if people like it.” We lived with Macro viruses for such a long time because Microsoft thought that macros in documents might be a good idea. Only when they disabled running macros in documents by default did the problem largely disappear - and no one thought it was a big loss.
User education continues to fail as we can barely explain things to each other, let along to non-technical users. Also the landscape continues to change too quickly. For the longest time, I would say that documents could not be infected, hoping that it would stay true even when we were aware of the possibility. And then WM/Concept hit the scene in 1995...
We are fighting an ecology of cyber criminalsThis is not about ‘one criminal’ or ‘one gang’. There is a vast network of service providers that the principle perpetrator uses for nearly every aspect of the crime. It is a surprise that they have not adopted a SOA architecture yet. For law enforcement, it is extremely difficult to find every party involved and then usually find that many of the actors are out of their jurisdiction and the principle perpetrator is heavily shielded.
The 80/20 rule seems to apply. They can get the most profit out of perhaps 20% of the potential targets. These are the people who have not updated their systems and kept it secure enough. I would like to say they must be gullible, but these attacks have become very sophisticated and not enough has been done to make it easy to spot the deception. There are also attacks against high-valued targets, but these are rare for a variety of reasons. First of all, economies of scale are not favorable, and because of the methods used, the perpetrators are more easily traced. These attacks exist and are probably being used in IP theft and patriotically motivated citizen cyber warfare (e.g, Estonia and Georgia).
Security vendors unfortunately have the same problem. They have to go after the 20% of the potential vulnerabilities they they feel will be responsible for 80% of the attacks. Covering all bases is impossible and the landscape is constantly changing anyway. I know this because I typically look further down the pipeline than most.
The trick is to make it not worth the criminal’s while. If a bank can limit the potential payout, make it inconvenient enough or create a mandatory delay in processing to give law enforcement an edge, they will move elsewhere. You still have to be vigilant as you don’t want to become the low-hanging fruit.

Dealing with the problem

System and service providers must assume the risk

EULAs and Terms of Use typically release the system provider from any responsibility to produce secure software. This is a mistake and probably shouldn’t be legal. A system of insurance for managing risk should be instated to deal with the security risks in the software as the insurance industry has a lot of experience dealing with risk mitigation and will find a good balance.
There is a desperate need to over engineer software w.r.t. security especially. The problem is that current economics don’t encourage this practice. It is hard to become a car manufacturer because creating a car to withstand all the stresses on a car is something gained through long experience. But failure to do so is no longer tolerated: there are regulations and litigative pressures to do ones best. Also interestingly, regulation doesn’t specify methods, but outcomes. While I don’t like the car analogy too much, I think similar ideas can come to play in the software industry.
Brittle in this context means that your system interaction will not break down into an inhuman experience when things go wrong. It shouldn’t dissemble when confronted with unusual input. The failure mode should be rooted in common sense. System design must be rooted in human expectation and not just on machine feasibility.

Reactive Security

Lastly, security need to become reactive - and it’s not reactive enough. Diagnosis needs to occur much closer to the user so that timeliness and context is not lost. At the moment, malware is collected in an ad hoc fashion, signatures are created and deployed. The time between a threat being deployed and detected is far too long. We tried to fix this with the Digital Immune System, but tragically it never was deployed as expected and is now dead with no apparent replacement. Perhaps better solutions are still being shunned by customers as they are often heuristic based, and traditional solutions are preferred as they are perceived as having a more deterministic outcome. However, this is faulty thinking. The variety of solutions have to work together in concert, but for this to happen, they must understand each other.

This is pretty dense, but even though I compressed it even further when speaking, I think I may have gone overtime. Of course, so did everyone else, so there was not really enough time left for a proper discussion. But in the breaks and over lunch and dinner these topics came up again and again.
It was a great workshop and fun to meet old friends and make new ones. Thanks to the organizers who put it together. It would be interesting to attend the next time, too!
For students, there is a similar workshop held nearly every year and organized by IFIP: the IFIP Summer School on Security, Privacy and Society organized by IFIP WG 9.2, 9.6/11.7 and 11.6. The last one was in the Czech Republic. As a student, you can get credit for participating.


UPDATED: Workshop on Interdisciplinary Studies in Security and Privacy

On ThursdayFriday Sept. 26, I'm going have to get up unbelievably early to get to a panel on Targeted Malware at the Workshop on Interdisciplinary Studies in Security and Privacy hosted by the Polytechnic Institute of NYU. Gosh, was that gratuitous or what?
The panel position statement begins with:
Malware undermines trust in information systems. To a certain extent, our success as information system engineers can be measured in terms of the amount of trust that society puts in the systems we have built. Malware, therefore, threatens our success, hinders the acceptance of technologies, and could even potentially reverse the progress that has already been made. The situation is not purely technical. Improved technology can sometimes help (e.g., better software quality), but practical solutions to current and future problems with malware will likely involve a mixture of techniques from multiple areas.
I think I can go along with most of that. Luckily, there are points later in the test where I deviate in opinion, so it should be interesting. 
Anyway, it will be nice to go back to what was once the Brooklyn Polytechnic, where I spent a while shepherding a very interesting project that Prof. Phyllis Frankl was leading. I can't tell you what it was, except it was cool new malware detection technology that never made it into a product as Symantec bought IBM AntiVirus around that time and apparently weren't interested, but it did influence my further research.
If anyone wants to attend the panel (please leave the rotten fruit at home), you have to be invited and you can try your luck on the registration page.

For reasons I can't explain, I had Thursday marked down in my calendar and not Friday, which is the correct date. I'll blame it on the financial crisis :-)


DARPA wants to make soldier more easily targetable

A very long time ago - nearly another life ago - I helped a military contractor who had a security breach try to see what the scope of the breach was. In the process I learned quite a bit about battlefield communications. So, you can imagine my surprise to read this article in The Register about outfitting every soldier with a long distance readable RFID tag (not the type in the picture on the right, by the way). Readable from 150 km, no less! Considering all the pains the contractor I worked with went through to prevent any form of RF to be emitted, I find this technology rather bizarre. 
I could imagine the tag being useful for training and maneuvers. I could also imagine it being useful as a last resort for a soldier to be located when lost or wounded in theater and to be fair, in the referenced presentation it mentions (once) that the tag is 'inert', which may mean that it needs to be activated before it sends a beacon. That might be an acceptable application.
What worries me is that UAVs are now so cheap and accessible that I could easily imagine even small states being able to afford a small fleet of UAVs that swarm over enemy troops in theater and home in on these tags. Or other forms of identifiable RF radiation, such as from a FCW Land Terminals. 
I hope they know what they are doing, but I have my doubts.


Insider hacks own system

ABC reports that a San Francisco employee created "virtually exclusive access to most of the city's municipal data." And I thought those days were long gone when that would be possible. 
However, it doesn't sound like something that a good computer forensics expert couldn't solve.


I really like this idea from Symantec: paper-based malware. Somehow this way of raising awareness of the problems of bots appeals to me.
I can't be sure that ideas like that really work, but one can hope. Anyway, better to have a bot next to one's PC than in it!
But what's with that Norton Today site? No RSS feed? Come on! 


Semantic Web Meetup June 1, 2008

What would you do on a day of perfect weather in New York City? Attend an all-day code camp on Semantic Web programming in Brooklyn of course! OK, I guess I would have preferred it to be a rainy day if I had to be inside, but it still was worth it. I learned a lot about Semantic Web programming and more importantly, realized that the technology is closer to being reality than before. This is a report on what I learned.
The event was organized by Marco Neumann and hosted by Breck Baldwin of Alias-i. After bagles, that Marco had brought along, and introductions, a brief run-down of some of the concepts and technologies was given. This was followed by quick descriptions of the projects we were to tackle at the meetup. After a rather late lunch we chose our projects and had a few hours to complete them. In theory, we were supposed to use the Extreme Programming paradigm, but that devolved a bit into group programming interspersed with discussion. 
I don't really want to go into the projects in detail. I was interested in two of them, the Natural Language Processing project headed by Breck, our host, and a spatial reasoning projected headed by Marco. The actual projects were not that important really, though, instead the programming aspects were. I was at a disadvantage as it turned out that Java is king when it comes to semantic web programming and I've been doing my programming in Ruby and Erlang for over a year. Semantic Web support for Ruby is not great and not really existent on Erlang. 
In Java, the way to go was to use the Jena library. Jena started at HP, but in the intermediate time had become a sourceforge project. It now offers support for RDF, RDFS, OWL and SPARQL. It also supports reading and writing the RDF in RDF-XML, N3, N-Triple and I believe also Turtle. There was some discussion of the strengths and weaknesses of these formats. The rough consensus was that N3 and N-Triple are more human readable, but RDF-XML is more expressive, at least from a syntactical standpoint. It wasn't clear to me if there was any semantic difference. In the NLP project, Jena was used to emit RDF, initially in N3 format, though that was quickly changed to RDF-XML. Once that was done for a subset of the data, a SPARQL query was hacked together (again using Jena) that used that file. All in all, it required not that much real code, though given that it was Java there was all sorts of fluff surrounding it. 
On a side note, one of the participants showed us some of his Groovy code, and I must say that Groovy might get me back in Java again. It's like a less wordy version of Java, or perhaps a Java that has been put on a diet by the Ruby camp. When Groovy is mentioned, I guess you have to mention Scala as well. Both seem to be taking Java beyond the confines of the actual language, Java, by leveraging Java, the virtual machine and all the libraries that are available as Jars. 
Apart from the programming, there were a few other things I picked up. In the past I had been using Protegé. However, apparently this is no longer the way to go. A company called TopBraid Composer, which is based on the Eclipse platform and Jena has usurped Protegé from its throne. Apparently it is free for non-commercial use, though that is unclear from the website as it does say that you need to purchase a license after 30 days. 
One of the other projects was looking at transforming a relational database into an RDF database using D2RQ. There was a paper at W3 that describes this idea. From what I gather, this is nearly equal to trying to derive semantics from database schemata - not something that can really be mechanized. There are also all sorts of performance issues that will have to be addressed if a production database were to be stored as an RDF database, but perhaps this is too early to discuss those issues as we first need to understand why we need them this at all. If it means that we can elevate the data in a database to the level of information, this might be worth it, though. Since there seem to be all sorts of expressivity issues when comparing traditional databases to RDF stores, perhaps the right thing would be to develop new application based on RDF first and only then try to transform existing databases. 
Another subject that came up was the difference between ABox and TBox reasoning. ABox reasoning is based on assertions on individuals (ie, the rows of data, to use a database table analogy) whereas TBox reasoning is based on concepts (ie, the schema of a database table).
So, what does all this have to do with security? There are two aspects of this. 
  1. The security of the Semantic Web metadata
  2. Using Semantic Web technology to secure our data
The first aspect is certainly not a trivial one. Metadata has already caused embarrassment to many people including Tony Blair when people don't realize that there is more data in a typical document than (literally) meets the eye. In computer forensics, this is what we live for. However, as more webpages get semantic data attached to them, more data may be transmitted than gets shown to the user and now it can be read automatically. Privacy advocates will be all over this problem, but corporations will have to pay attention, too. 
However, what I am more interested in is the use of this metadata and the technologies of the semantic web to define and enforce security. At IBM, this is called Data-Centric Security and as far as I can tell, they are working on database security using taxonomies for classification. What the NLP projected showed me is that to some degree, we could also create a content based security system at some point in time. Alias-i and OpenCalais might be the key.
What the code camp showed me is that the technology has reached the point that it is usable. While security is nearly never a business case in itself, there will be other, more motivating, reasons to use semantic metadata in corporations and that will enable such ideas as DCS.


I just read about a canadian bank handing out ASUS EEE PCs. Now, a while ago when dealing with a Bank phishing case, I suggested that hypothetically, a bank could offer a simple laptop to their customers solely to do banking with. Of course, that is a problematic proposition that one single bank may not actually want to venture into on its own. We went over perhaps more practical solutions like booting from a Knoppix-like CD that could only be used for bank transactions as well as using a VMWare image. None were adapted, of course.

Now RBC is offering a free ASUS EEE PC as an incentive, but I keep on thinking that having a dedicated cheap laptop would be a good idea for the general population. The EEE is a small and basic Linux system, but supports one important thing: a browser useful for e-banking, Firefox. While I'm not a fan of remote attestation on practicality grounds, I could see it working in this case of a dedicated e-banking machine.

Would banks want to get into this business? I don't think so. However, an enterprising company might be able to convince enough banks to offer such a device to their customers and offer the security maintenance of such systems as a service.


Virus calendar

Gosh, what a blast from the past. Way, way back in the annals of time it was actually possible to analyze malware (OK, viruses mainly) well enough to know all trigger dates of their payloads. So, I thought I'd create a virus calendar. I think the first was created for the year 1990 or 1991 for the Virus Test Center so that we'd have something fun to show at expos like CeBIT. Then I got a commission to create one (and later another one) for perComp Verlag which they dug up recently and posted on their site for the years 1992 and 1993. While the 1992 version was mainly my work, I think I only contributed data and ideas to the 1993 version. Apparently some people in Germany still show excerpts of these calendars in presentations, though I shudder to think why. I think S&S International Ltd, UK also created a few calendars based on their own data and graphics, but after 1995 it became impossible to analyze all known viruses and even if it was possible, it would have been one crowded calendar!
This is one rare moment, where I was able to do something graphical in the context of computer security.
I think when I get back home, I'll dig out the original and scan that in, too!


OT: Splint

Just when I had accumulated a few things to blog about again (thought time is still a precious resource) I manage to hurt my middle finger bad enough to warrant a splint. My thumb was already hurt from a small skiing accident. That is going to put a damper on my blogging and other writing.
It's going to have to stay on for six weeks. Ugh. I guess I should look around for one handed keyboards!


Survey of disassemblers

As the preparation for my course on Data Communications Forensics and Security this term, I've decided to to do a quick survey of disassemblers. The problem is that I've been writing my own disassemblers for special purposes, but I need to have something more general purpose for the students. Also, the code I wrote stayed with IBM when I left. Here is a quick survey of what I've found so far, in no particular order.
  1. Let's start out with the reigning King of disassembly, IDA Pro. This is more a disassembler framework than just a disassembler only. As of Jan 2008 they've moved it from the old DataRescue website (and presumably distributor) to the new Hex-Ray site. It's up to version 5.2 and there are quite a few plug-ins for it and this is clearly the strength of IDA Pro. Unfortunately, they want serious money for it and the University isn't interested in paying. I'm also a bit concerned about the move to Hex-Ray. What does it mean?? Will it survive. I'd also like something that came with source code.
  2. Sourcer doesn't seem to exist any more. V-Communication's website doesn't seem to list it. Sourcer used to be my favorite disassembler before I got into writing my own.
  3. Apparently ASMGen is still around, but I think it is stuck in the 16bit MS-DOS world. It was basic back then and must be antique now. I'll give it a spin and see.
  4. Jean-Louis SEIGNE's disasm32 is apparently a VxD disassembler according to his own website and it is available via WinSite. (Another website seems to indicate it is a visual disassembler, I'll find out what I get the chance to run it.) It seems to be at least 12 years old, so I don't think it will be that interesting.
  5. I can't find WDASM, so it is probably dead.
  6. Obj2asm is an MS-DOS object file disassembler and is available on Simtel.
  7. The New Jersey Machine-Code Toolkit also seems to have been discontinued back in 1998. It's written in SML and utilizes a machine model for disassembly (amongst other things) which should give it a lot of flexibility. However, it doesn't help if the project has been abandoned.
  8. GNU offers a lineup of surprisingly useful tools in its binutils package. Quoting: "nm - Lists symbols from object files. objdump - Displays information from object files. readelf - Displays information from any ELF format object file. strings - Lists printable strings from files. " They are meant for UNIX and so are not that useful for Windows. However, in theory the should be able to handle PE files, but they are not robust or endian agnostic.
  9. OllyDebug is not a disassembler, but a debugger. However, quite a few people use it for program analysis either to aid the disassembly or to produce the disassembly. It's free and one of the best.
  10. Although not actually a disassembler, REC attempts to decompile from binary to source. It uses the netwide disassembler for preprocessing, according to the documentation.
  11. The Netwide Disassembler is a part of the Netwide Assembler project. It doesn't actually understand the various binary file formats itself, so you have to give it the naked binary code. I've used this and objdump in my projects. It is far more useful than it sounds like. Consider that a certain amount of malware can only be snagged from memory.
  12. Boomerang is another decompiler (versus a disassembler). It was active until 2006, so I'll have to see where it stands.
  13. The diStorm project looks very very interesting to me in that they want to create a really good library for disassembly, not just a disassembler. This will not be for the casual disassembler. The core library is written in C (source is available, I think) and it interfaces with Python, which wouldn't have been my choice. They also have separated the opcode libraries from the code (again, according to the documentation) which makes it easier to repurpose the code, though I always wonder how much real mileage you get from it.
The open directory project lists a few more here.
Another mention is Wotsit. This site has been very useful over the years in figuring out various file formats (I'm a file format hacker at heart, but long inactive.) You need this site to figure out the various binary file formats.
So, the next step in this exercise is to evaluate the best candidates and see how well they will do in practice. That will be in some later post.