Hardware Database

Wed Nov 5 22:03:56 UTC 2003

I've been thinking about something like that in the past few weeks.

I believe the project to collect and exploit hardware information
could be separated in three parts:

1. Software to collect the hardware data using various tools and
producing a harware data file.

2. Storage servers to collect user submitted hardware data files.

3. Software or web service to exploit the data files in storage
servers and their mirrors.

The simplest possible thing I can think of that would work under this
scheme:

1. The file format could be just a compressed text file recalling the
software version used, command issued without any filtering:

bash$ simple_lhw_collector.sh | gzip > my_lhw_data.gz
bash$ zcat my_lhw_data.gz
%lhwdata:version:simple_lhw_collector.sh version 0.0.0-pre-alpha0-ok-you-re-warned
%lhwdata:usertime:20031105T2223Z
%lhwdata:userinfo
Anonymous (default of course)
%lhwdata:command:cat /proc/version 
Linux version 2.4.20-20.9 (bhcompile at stripples.devel.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 Mon Aug 18 11:45:58 EDT 2003
%lhwdata:command:lspci
00:00.0 Host bridge: Intel Corp. 82815 815 Chipset Host Bridge and Memory Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corp. 82815 815 Chipset AGP Bridge (rev 02)
00:1e.0 PCI bridge: Intel Corp. 82801BAM/CAM PCI Bridge (rev 02)
00:1f.0 ISA bridge: Intel Corp. 82801BAM ISA Bridge (LPC) (rev 02)
00:1f.1 IDE interface: Intel Corp. 82801BAM IDE U100 (rev 02)
00:1f.2 USB Controller: Intel Corp. 82801BA/BAM USB (Hub #1) (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 Go] (rev b2)
02:03.0 Multimedia audio controller: ESS Technology ES1983S Maestro-3i PCI Audio Accelerator (rev 10)
02:06.0 Communication controller: Lucent Microelectronics WinModem 56k (rev 01)
02:0f.0 CardBus bridge: Texas Instruments PCI4451 PC card Cardbus Controller
02:0f.1 CardBus bridge: Texas Instruments PCI4451 PC card Cardbus Controller
02:0f.2 FireWire (IEEE 1394): Texas Instruments PCI4451 IEEE-1394 Controller
07:00.0 USB Controller: NEC Corporation USB (rev 41)
07:00.1 USB Controller: NEC Corporation USB (rev 41)
07:00.2 USB Controller: NEC Corporation USB 2.0 (rev 02)
%lwdata:command:...
...
%lwdata:end
bash$

This should allow us to survive future linux changes in userland tools
and to allow people developping more serious harware data file formats
or databases to get all the data they need. My estimate is that such a
file should be less than 20k compressed.

User information should be opt-in of course, with the user telling if
she wants to be contacted when new drivers or whatever are available
so she can give a hand testing.

Developpers are free to write whatever lhw file generator they like
(TUI, GUI, Qt, GTK+, PHP, ...), lhw collectors for the enterprise
running as a sophisticated distributed system ot collect data on your
company harware (ok a bit too much :).

2. The storage servers could be anything such as anonymous ftp upload,
web site with upload form, XML services, could be accessed from the
data collecting software,...  To allow easy collecting and mirroring,
a standard flat file and directory could be used by all storage
systems and their mirrors, for example the official "fedora"
collecting site could be structured as follows:

http://www.fedora.us/lhw/200311/20031105/fedora-lhw-000042.gz
http://www.fedora.us/lhw/200311/20031105/fedora-lhw-000043.gz

So it's easy to mirror and aggregate since it's write once, never
move, rename or change (the date is obviously the server date at the
time of submission, and the number a serial id for the day).  A
collecting site could just mirror all files with any technology (ftp
mirror, rsync). Big bandwidth mirrors could just offer monthly tar so
anyone has an easy access to all the user data.

When a user submits something to a storage server, she should get the
URL to the collected file, that would do as an id.

3. The fun part is the service part. May be google would index
compressed text files (may be served by a transparently uncompressing
web server), otherwise anyone is free to develop any software or web
site:

- we've done our best to collect in a simple way all possible harware
data without commiting to a particular database format. In particular
we can improve the data collector to follow future Linux developments
without changing the existing data.

- we made it easy to assemble, mirror and process the hardware
data. Everyone as a fair access to it, from the small kernel
developper looking for a driver to develop and debug, to the major
Linux companies.

What do people think about the general idea? May be Red Hat could
contribute some data to a first version (probably legal issues, I'm
not a lawyer)? 

I've no idea on the number of systems we should expect, as I said we
should be able to host 50 000 systems per gigabyte, and there's no
need to have a central server to begin with, so people
can contribute their repository URL to the fedora
wiki.

Laurent