• R/O
  • SSH

xtab: 提交

Default repository for xtab.py


Commit MetaInfo

修訂a573cf923d273269f88c6534f74f12345a8af612 (tree)
時間2018-08-24 12:47:33
作者Dreas Nielsen <dreas.nielsen@gmai...>
CommiterDreas Nielsen

Log Message

Restructured repository and added distribution setup files.

Change Summary

差異

diff -r fbcf053ca74c -r a573cf923d27 .hgignore
--- a/.hgignore Tue Aug 21 12:45:10 2018 -0700
+++ b/.hgignore Thu Aug 23 20:47:33 2018 -0700
@@ -3,4 +3,7 @@
33 setup-compile.py
44 build/*
55 dist/*
6+test/*
67 xtab-0.7.0.0
8+.*~
9+
diff -r fbcf053ca74c -r a573cf923d27 LICENSE.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/LICENSE.txt Thu Aug 23 20:47:33 2018 -0700
@@ -0,0 +1,11 @@
1+xtab.py
2+Copyright (c) 2008, R.Dreas Nielsen
3+
4+This program is free software: you can redistribute it and/or modify it under
5+the terms of the GNU General Public License as published by the Free Software
6+Foundation, either version 3 of the License, or (at your option) any later
7+version. This program is distributed in the hope that it will be useful, but
8+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
9+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
10+details. The GNU General Public License is available at
11+http://www.gnu.org/licenses/.
diff -r fbcf053ca74c -r a573cf923d27 MANIFEST.in
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/MANIFEST.in Thu Aug 23 20:47:33 2018 -0700
@@ -0,0 +1,2 @@
1+include doc/xtab.htm
2+include LICENSE.txt
diff -r fbcf053ca74c -r a573cf923d27 README.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/README.txt Thu Aug 23 20:47:33 2018 -0700
@@ -0,0 +1,185 @@
1+xtab.py
2+Crosstabulate data in a text file.
3+
4+xtab.py is a Python module and command-line program that rearranges data from
5+a normalized format to a crosstabulated format.
6+
7+Contents
8+======================
9+
10+ Installation
11+ Capabilities
12+ Required and Optional Arguments
13+ Notes
14+ Copyright and License
15+
16+
17+Installation
18+======================
19+
20+You can install the program either manually or using Python's installation tools.
21+
22+
23+Manual Installation
24+--------------------
25+
26+Because the functionality of the xtab program is contained in a single
27+Python script, you can simply extract the xtab.py script from the zipped
28+package file, and put the script wherever you want. This may be in the
29+standard location for Python scripts, or in any other location from which you
30+want to execute it.
31+
32+
33+Automated Installation
34+-----------------------
35+
36+The automated installation process uses Python's standard setup tool to place
37+the program in the standard location for third-party scripts (e.g.,
38+C:\Python\Scripts or C:\Program Files\Python\Scripts). Follow these steps to
39+perform an automated installation:
40+
41+1. Unzip the package file. The file can be unzipped in a temporary location,
42+ such as C:\Temp. The contents of the package will be placed in a
43+ subdirectory named for the program and release version.
44+2. Open a command window in the subdirectory containing the unzipped files and
45+ type the following command at the prompt:
46+
47+ python setup.py install
48+
49+If you want to install to a non-standard location, instructions for a
50+customized installation can be found at
51+http://docs.python.org/install/index.html#inst-alt-install.
52+
53+
54+
55+Capabilities
56+======================
57+
58+You can use the xtab program to:
59+
60+ * Rearrange data exported from a database to better suit its
61+ subsequent usage in statistical, modeling, graphics, or other
62+ software, or for easier visual review and table preparation.
63+ * Convert a single file (table) of data to a SQLite database.
64+ * Check for multiple rows of data in a text file with the same
65+ key values.
66+
67+
68+Required and Optional Arguments
69+================================
70+
71+Required Arguments
72+-------------------
73+
74+-i <filename>
75+ The name of the input file from which to read data. This must be a text file,
76+ with data in a normalized format. The first line of the file must contain
77+ column names.
78+
79+-o <filename>
80+ The name of the output file to create. The output file will be created as
81+ a .csv file.
82+
83+-r <column_name1> [column_name2 [...]]
84+ One or more column names to use as row headers (space delimited). Unique
85+ values of these columns will appear at the beginning of every output line.
86+
87+-c <column_name1> [column_name2 [...]]
88+ One or more column names to use as column headers in the output (space
89+ delimited). A crosstab column (or columns) will be created for every unique
90+ combination of values of these fields in the input.
91+
92+-v <column_name1> [column_name2 [...]]
93+ One or more column names with values to be used to fill the cells of the
94+ cross-table. If n columns names are specified, then there will be n columns
95+ in the output table for each of the column headers corresponding to values
96+ of the -c argument. The column names specified with the -v argument will
97+ be appended to the output column headers created from values of the -c
98+ argument. There should be only one value of the -v column(s) for each
99+ combination of the -r and -c columns; if there is more than one, a warning
100+ will be printed and only the first value will appear in the output.
101+ (That is, values are not combined in any way when there are multiple values
102+ for each output cell.)
103+
104+
105+Optional Arguments
106+-------------------
107+
108+-d
109+ Prints output column headers in two rows. The first row contains values
110+ of the columns specified by the -h argument, and the second row contains
111+ the column names specified by the -v argument. If this is not specified,
112+ output column headers are printed in one row, with elements joined by
113+ underscores to facilitate parsing by other programs.
114+
115+-f
116+ Use a temporary (sqlite) file instead of memory for intermediate storage.
117+
118+-k Keep (i.e., do not delete) the sqlite file. Only useful with the "-f" option.
119+ Unless the "-t" option is also used, the table name will be "src".
120+
121+-t <tablename>
122+ Name to use for the table in the intermediate sqlite database. Only useful
123+ with the "-f" and "-k" options.
124+
125+ -e [filename]
126+ Log all error messages, to a file if the filename is specified or to the
127+ console (stderr) if the filename is not specified.
128+
129+ -q <filename>
130+ Log the sequence of SQL commands used to extract data from the input file
131+ to write the output file, including the result of each command.
132+
133+ -h
134+ Print a summary of the command-line arguments and exit.
135+
136+
137+Notes
138+======================
139+
140+ * Multiple data values can be crosstabbed, in which case the output
141+ will contain multiple sets of similar columns.
142+ * Either one or two rows of headers can be produced in the output file.
143+ One row is the default, and is most suitable when the output file will
144+ be further processed by other software. Two rows facilitate readability
145+ when the output contains multiple sets of similar columns.
146+ * The xtab program does not carry out any summarization or
147+ calculation on the data values, and therefore there should be
148+ no more than one data value to be placed in each cell of the output
149+ table. More than one value per cell is regarded as an error, and in
150+ such cases only one of the multiple values will be put in the cell.
151+ * Error messages can be logged to either the console or a file. If no
152+ error logging option is specified, then if there are multiple values
153+ to be put in a cell (the most likely data error), a single message
154+ will be printed on the console indicating that at least one error of
155+ this type has occurred. If an error logging option is specified,
156+ then the SQL for all individual cases where there are multiple values
157+ per cell will be logged.
158+ * The SQL commands used to extract data from the input file for each
159+ output table cell can be logged to a file.
160+ * As an intermediate step in the crostabbing process, data are converted
161+ to a SQLite table. By default, this table is created in memory.
162+ However, it can optionally be created on disk, and preserved so that
163+ it is available after the crosstabulation is completed.
164+ * There are no inherent limits to the number of rows or columns in the
165+ input or output files. (So the output may exceed the limits of some
166+ other software.)
167+ * Input and output file names, and column names in the input file that
168+ are to be used for row headings, column headings, and cell values are
169+ all required as command-line arguments. If any required arguments are
170+ missing, an exception will be raised, whatever the error logging option.
171+
172+
173+Copyright and License
174+======================
175+
176+Copyright (c) 2008, R.Dreas Nielsen
177+
178+This program is free software: you can redistribute it and/or modify it under
179+the terms of the GNU General Public License as published by the Free Software
180+Foundation, either version 3 of the License, or (at your option) any later
181+version. This program is distributed in the hope that it will be useful, but
182+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
183+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
184+details. The GNU General Public License is available at
185+http://www.gnu.org/licenses/.
diff -r fbcf053ca74c -r a573cf923d27 setup.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/setup.py Thu Aug 23 20:47:33 2018 -0700
@@ -0,0 +1,99 @@
1+from distutils.core import setup
2+
3+setup(name='xtab',
4+ version='0.10.0.0',
5+ description="Crosstabulate data in a text file.",
6+ author='Dreas Nielsen',
7+ author_email='dreas.nielsen@gmail.com',
8+ url='none',
9+ scripts=['xtab/xtab.py'],
10+ license='GPL',
11+ classifiers=[
12+ 'Environment :: Console',
13+ 'Intended Audience :: End Users/Desktop',
14+ 'License :: OSI Approved :: GNU General Public License (GPL)',
15+ 'Operating System :: OS Independent',
16+ 'Topic :: Office/Business'
17+ ],
18+ long_description="""``xtab.py`` is a Python module and command-line program that
19+rearranges data from a normalized format to a crosstabulated format. It takes data
20+in this form:
21+
22+======= ========== =====
23+Station Date Value
24+------- ---------- -----
25+WQ-01 2006-05-23 4.5
26+WQ-02 2006-05-23 3.7
27+WQ-03 2006-05-23 6.8
28+WQ-01 2006-06-15 9.7
29+WQ-02 2006-05-15 5.1
30+WQ-03 2006-06-15 7.2
31+WQ-01 2006-07-19 10
32+WQ-02 2006-07-19 6.1
33+WQ-03 2006-07-19 8.8
34+======= ========== =====
35+
36+and rearranges it into this form:
37+
38+======= ========== ========== ==========
39+Station 2006-05-23 2006-06-15 2006-07-19
40+------- ---------- ---------- ----------
41+WQ-01 4.5 3.7 6.8
42+WQ-02 9.7 5.1 7.2
43+WQ-03 10 6.1 8.8
44+======= ========== ========== ==========
45+
46+Input and output are both text (CSV) files.
47+
48+
49+Capabilities
50+=============
51+
52+You can use the xtab program to:
53+
54+* Rearrange data exported from a database to better suit its
55+ subsequent usage in statistical, modeling, graphics, or other
56+ software, or for easier visual review and table preparation.
57+* Convert a single file (table) of data to a SQLite database.
58+* Check for multiple rows of data in a text file with the same
59+ key values.
60+
61+
62+Notes
63+======
64+
65+* Multiple data values can be crosstabbed, in which case the output
66+ will contain multiple sets of similar columns.
67+* Either one or two rows of headers can be produced in the output file.
68+ One row is the default, and is most suitable when the output file will
69+ be further processed by other software. Two rows facilitate readability
70+ when the output contains multiple sets of similar columns.
71+* The xtab program does not carry out any summarization or
72+ calculation on the data values, and therefore there should be
73+ no more than one data value to be placed in each cell of the output
74+ table. More than one value per cell is regarded as an error, and in
75+ such cases only one of the multiple values will be put in the cell.
76+* Error messages can be logged to either the console or a file. If no
77+ error logging option is specified, then if there are multiple values
78+ to be put in a cell (the most likely data error), a single message
79+ will be printed on the console indicating that at least one error of
80+ this type has occurred. If an error logging option is specified,
81+ then the SQL for all individual cases where there are multiple values
82+ per cell will be logged.
83+* The SQL commands used to extract data from the input file for each
84+ output table cell can be logged to a file.
85+* As an intermediate step in the crostabbing process, data are converted
86+ to a SQLite table. By default, this table is created in memory.
87+ However, it can optionally be created on disk, and preserved so that
88+ it is available after the crosstabulation is completed.
89+* There are no inherent limits to the number of rows or columns in the
90+ input or output files. (So the output may exceed the limits of some
91+ other software.)
92+* Input and output file names, and column names in the input file that
93+ are to be used for row headings, column headings, and cell values are
94+ all required as command-line arguments. If any required arguments are
95+ missing, an exception will be raised, whatever the error logging option.
96+* Data rows are sorted alphanumerically by the row headers and column
97+ headers are sorted alphanumerically in the output.
98+"""
99+ )
diff -r fbcf053ca74c -r a573cf923d27 xtab.py
--- a/xtab.py Tue Aug 21 12:45:10 2018 -0700
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,614 +0,0 @@
1-#! /usr/bin/python
2-"""xtab.py
3-
4-PURPOSE
5- Read a table (from a text file) of data in normalized form and cross-tab it,
6- allowing multiple data columns to be crosstabbed.
7-
8-AUTHOR
9- R. Dreas Nielsen (RDN)
10-
11-COPYRIGHT AND LICENSE
12- Copyright (c) 2007-2018, R.Dreas Nielsen
13- This program is free software: you can redistribute it and/or modify
14- it under the terms of the GNU General Public License as published by
15- the Free Software Foundation, either version 3 of the License, or
16- (at your option) any later version.
17- This program is distributed in the hope that it will be useful,
18- but WITHOUT ANY WARRANTY; without even the implied warranty of
19- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20- GNU General Public License for more details.
21- The GNU General Public License is available at <http://www.gnu.org/licenses/>
22-
23-NOTES
24- 1. This code can be used either as a module or as a stand-alone script.
25- 2. The sole function intended to be used by callers of this module is 'xtab()'.
26- 3. When there are multiple values in the input that should go into a single
27- cell of the output, only the first of these is written into that cell
28- ('first' is indeterminate). The 'xtab()' function allows logging of
29- the data selection statement (SQL) used to obtain the data for each cell,
30- and the result(s) obtained, and thus to determine which cell(s) have
31- multiple values.
32-
33-HISTORY
34- ---------- ------------------------------------------------------------------------------
35- Date Revisions
36- ---------- ------------------------------------------------------------------------------
37- 9/12/2007 Created; incomplete. RDN.
38- 10/27/2007 Finished full functionality of 'xtab()'. Usable as module or script. RDN.
39- 10/28/2007 Cleaned up code and documentation. Improvements to be made:
40- 1. Implement a class to wrap csv reader and writer objects
41- to ensure appropriate closure of underlying file objects
42- on error.
43- 2. Implement a class (object) for the input file to simplify
44- format evaluation and opening (i.e., reduce multiple opens).
45- 3. Provide a default log-writing handler, and possibly an additional
46- command-line argument consisting of an output filename for
47- the log.
48- 4. Implement error-checking of the column names in the command
49- line arguments prior to calling 'xtab()' (do #2 above first).
50- 5. Add more specific error traps throughout.
51- RDN.
52- 1/13/2008 Revised to use the csv module's format sniffer instead of custom
53- code in this module. This addresses item 2 in the 10/28/2007 list.
54- Revised so that the sqlite db is created in memory by default, and a
55- temporary file is created only if specified by the (new) '-f'
56- command-line option. RDN.
57- 6/14/2008 Doubled apostrophes in string data values. RDN.
58- 12/20/2008 Added options to keep the sqlite file, to specifiy the sqlite table
59- name, and to log both error messages and SQL commands. Bumped version
60- number to 0.7. RDN.
61- 10/19/2011 Added the "-n" option, allowing null values to be replaced in the
62- output with some user-specified string. Version 0.8. RDN
63- 10/20/2011 Added the "-d3" and "-d4" options to allow other types of header
64- row output. Version 0.9. RDN.
65- 02/08/2010 Corrected tests for -f and -k options. Version 0.9.1. RDN.
66- 2018-02-15 Modified to sort column and row header values in ascending order. RDN.
67- 2018-03-05 Modified so that sorting is controlled by a new option, "-s". RDN.
68- 2018-08-21 Modified to delete the error message file if it was created but
69- no errors occurred. RDN.
70- 2018-08-21 Closed the error logger if an empty error log file is to be deleted. RDN.
71-=====================================================================================
72-"""
73-
74-#=====================================================================================
75-# TODO:
76-# * Implement a class to wrap csv reader and writer objects
77-# to ensure appropriate closure of underlying file objects
78-# on error. (Or use 'with' in 2.6/3.0)
79-# * Implement error-checking of the column names in the command
80-# line arguments prior to calling 'xtab()'.
81-# * Add more specific error traps throughout.
82-#=====================================================================================
83-
84-
85-_version = "0.10.1.0"
86-_vdate = "2018-03-05"
87-
88-import sys
89-import os
90-import os.path
91-import csv
92-import sqlite3
93-import copy
94-import logging
95-import traceback
96-
97-_errmsg_noinfile = "No input filename specified."
98-_errmsg_badinfile = "Input file does not exist."
99-_errmsg_nooutfile = "No output filename specified."
100-_errmsg_norowheaders = "No row header columns specified."
101-_errmsg_nocolumheaders = "No column header columns specified."
102-_errmsg_nocellcolumns = "No cell value columns specified."
103-_errmsg_baderrlogfile = "Only one error log file name should be specified."
104-_errmsg_badsqllogfile = "Only one SQL log file name should be specified."
105-
106-__help_msg = """Required Arguments:
107- -i <filename>
108- The name of the input file from which to read data.
109- This must be a text file, with data in a normalized format.
110- The first line of the file must contain column names.
111- -o <filename>
112- The name of the output file to create.
113- The output file will be created as a .csv file.
114- -r <column_name1> [column_name2 [...]]
115- One or more column names to use as row headers.
116- Unique values of these columns will appear at the beginning of every
117- output line.
118- -c <column_name1> [column_name2 [...]]
119- One or more column names to use as column headers in the output.
120- A crosstab column (or columns) will be created for every unique
121- combination of values of these fields in the input.
122- -v <column_name1> [column_name2 [...]]
123- One or more column names with values to be used to fill the cells
124- of the cross-table. If n columns names are specified, then there will
125- be n columns in the output table for each of the column headers
126- corresponding to values of the -c argument. The column names specified
127- with the -v argument will be appended to the output column headers
128- created from values of the -c argument. There should be only one value
129- of the -v column(s) for each combination of the -r and -c columns;
130- if there is more than one, a warning will be printed and only the first
131- value will appear in the output. (That is, values are not combined in
132- any way when there are multiple values for each output cell.)
133-Optional Arguments:
134- -d[1|2|3|4]
135- Controls the format of column headers. The four alternatives are:
136- -d1 or no option specified
137- One row of column headers, with elements joined by underscores
138- to facilitate parsing by other programs.
139- -d or -d2
140- Two rows of column headers. The first row contains values of the
141- columns specified by the -c argument, and the second row contains
142- the column names specified by the -v argument.
143- -d3
144- One header row for each of the values of the columns specified by
145- the -c argument, plus one row with the column names specified by
146- the -v argument.
147- -d4
148- Like -d3, but the values of the columns specified by the -c
149- argument are labeled with (preceded by) the column names.
150- -f
151- Use a temporary (sqlite) file instead of memory for intermediate
152- storage.
153- -k
154- Keep (i.e., do not delete) the sqlite file. Only useful with the
155- "-f" option. Unless the "-t" option is also used, the table name will
156- be "src".
157- -n <default_string>
158- Use the specified default string in the output wherever an empty or
159- null value would otherwise appear.
160- -s Sort columns and rows in ascending alphabetic order.
161- -t <tablename>
162- Name to use for the table in the intermediate sqlite database. Only
163- useful with the "-f" and "-k" options.
164- -e [filename]
165- Log all error messages, to a file if the filename is specified or to the
166- console if the filename is not specified.
167- -q <filename>
168- Log the sequence of SQL commands used to extract data from the input
169- file to write the output file, including the result of each command.
170- -h
171- Print this help and exit.
172-Notes:
173- 1. Column names should be specified in the same case as they appear in the
174- input file.
175- 2. The -f option creates a temporary file in the same directory as the
176- output file. This file has the same name as the input file, but an
177- extension of '.sqlite'.
178- 3. There are no inherent limits to the number of rows or columns in the
179- input or output files.
180- 4. Missing required arguments will result in an exception rather than an
181- error message, whatever the error logging option. If no error logging
182- option is specified, then if there are multiple values per cell (the
183- most likely data error), a single message will be printed on the console.
184-"""
185-
186-# enum for header row codes
187-hdrs_1, hdrs_2, hdrs_many, hdrs_labeled = (1, 2, 3, 4)
188-
189-def xtab(infilename, rownames, xtab_colnames, xtab_datanames, outfilename,
190- header_rows=hdrs_1, file_db=False, keep_file_db=False, tablename="src",
191- error_reporter=None, sql_reporter=None, nullfill=None, sort_alpha=False):
192- """Cross-tab data in the specified input file and write it to the output file.
193- Arguments:
194- infilename: string of the input file name. Some diagnosis of file format
195- (CSV or tab formatted) will be performed.
196- rownames: list of strings of column names in the input file that will be used
197- as row headers in the output file.
198- xtab_colnames: list of strings of column names in the input file that will be
199- used as primary column headers in the output file.
200- xtab_datanames: list of strings of column names in the input file that will be
201- crosstabbed in the output file. These column names will also be used as
202- secondary column names in the output file.
203- outfilename: string of the output file name. This file will all be written as CSV.
204- dualheader: boolean controlling whether or not there will be one or two header
205- rows in the output file. If a single header row is used, then the primary and
206- secondary column headers will be joined in each column header. If two column
207- headers are used, then the primary column headers will be used on the first
208- line of headers, and the secondary column headers will be used on the second
209- line of headers.
210- file_db: boolean controlling whether or not the sqlite db is created as a disk file
211- (if True) or in memory (if False, the default).
212- keep_file_db: boolean controlling whether or not a sqlite disk file is retained
213- (if True) or deleted after it has been used.
214- error_reporter: logging.Logger object to report nonfatal errors (specifically, the presence
215- of more than one value for a cell).
216- sql_reporter: logging.Logger object to report the sqlite queries executed and their
217- results.
218- nullfill: the value with which to replace null (empty) values in the output.
219- Return value
220- The number of warnings or errors encountere.
221-
222- When multiple column headers in the input file are used as a single column header in
223- the output file, the column names are joined with an underscore. This is to facilitate
224- any subsequent parsing to be done by other programs (e.g., R).
225- """
226- multiple_vals = False # A flag indicating whether or not multiple values were found for a single crosstab cell
227- outfile = open(outfilename, "wb") # The csv module adds an extra <CR> if "wt" is specified
228- csvout = csv.writer(outfile)
229- reportable_errors = 0
230-
231- # Move the data into sqlite for easy random access.
232- if file_db:
233- inhead, intail = os.path.split(infilename)
234- sqldbname = os.path.join(inhead, os.path.splitext(intail)[0] + ".sqlite")
235- try:
236- os.unlink(sqldbname)
237- except:
238- pass
239- else:
240- sqldbname = None
241- if tablename == None:
242- tablename = "src"
243- sqldb = copy_to_sqlite(infilename, sqldbname, tablename)
244-
245- # Get list of unique values for 'xtab_colnames' columns
246- if sort_alpha:
247- sqlcmd = "select distinct %s from %s order by %s;" % (",".join(xtab_colnames), tablename, ",".join(xtab_colnames))
248- else:
249- sqlcmd = "select distinct %s from %s;" % (",".join(xtab_colnames), tablename)
250- xtab_vals = sqldb.execute(sqlcmd).fetchall()
251-
252- # Write output headers.
253- if header_rows == hdrs_1:
254- # One header row
255- outstrings = [n for n in rownames]
256- for n in xtab_vals:
257- for i in xtab_datanames:
258- outstrings.append("%s_%s" % ("_".join(n), i.replace("'", "''")))
259- csvout.writerow(outstrings)
260- elif header_rows == hdrs_2:
261- # Two header rows
262- extra_cols = len(xtab_datanames) - 1
263- # Write header row 1/2
264- outstrings = ['' for n in rownames]
265- for n in xtab_vals:
266- hdr = " ".join(n)
267- outstrings.append(hdr.replace("'", "''"))
268- for i in range(extra_cols):
269- outstrings.append('')
270- csvout.writerow(outstrings)
271- # Write header row 2/2
272- outstrings = [n for n in rownames]
273- for n in xtab_vals:
274- for i in xtab_datanames:
275- outstrings.append(i.replace("'", "''"))
276- csvout.writerow(outstrings)
277- elif header_rows == hdrs_many:
278- # One header row for each item in xtab_vals plus a row
279- # for xtab_datanames.
280- extra_cols = len(xtab_datanames) - 1
281- for i in range(len(xtab_colnames)):
282- outstrings = ['' for n in rownames]
283- for n in xtab_vals:
284- outstrings.append(n[i].replace("'", "''"))
285- for x in range(extra_cols):
286- outstrings.append('')
287- csvout.writerow(outstrings)
288- # Write last header row of xtab_datanames
289- outstrings = [n for n in rownames]
290- for n in xtab_vals:
291- for i in xtab_datanames:
292- outstrings.append(i.replace("'", "''"))
293- csvout.writerow(outstrings)
294- else: # header_rows == hdrs_labeled
295- extra_cols = len(xtab_datanames) - 1
296- for i in range(len(xtab_colnames)):
297- outstrings = ['' for n in rownames]
298- for n in xtab_vals:
299- outstrings.append("%s: %s" % (xtab_colnames[i], n[i].replace("'", "''")))
300- for x in range(extra_cols):
301- outstrings.append('')
302- csvout.writerow(outstrings)
303- # Write last header row of xtab_datanames
304- outstrings = [n for n in rownames]
305- for n in xtab_vals:
306- for i in xtab_datanames:
307- outstrings.append(i.replace("'", "''"))
308- csvout.writerow(outstrings)
309-
310- # Write output data
311- # For each unique combination of row headers
312- # Initiate a new output line
313- # Get the row headers
314- # For every item in the xtab_vals
315- # Select the 'xtab_datanames' columns from sqlite for the row headers and xtab_vals
316- # Append the first result (set warning if >1) to the output line
317- # Write the output line
318- #
319- # Get a list of unique combinations of row headers
320- if sort_alpha:
321- sqlcmd = "SELECT DISTINCT %s FROM %s ORDER BY %s;" % (",".join(rownames), tablename, ",".join(rownames))
322- else:
323- sqlcmd = "SELECT DISTINCT %s FROM %s;" % (",".join(rownames), tablename)
324- row_hdr_vals = sqldb.execute(sqlcmd).fetchall()
325- row_counter = 0
326-
327- for l in row_hdr_vals:
328- row_counter = row_counter + 1
329- col_counter = 0
330- outstrings = []
331- # Add the row headers to the list of outstrings
332- for rn in range(len(l)):
333- outstrings.append(l[rn].replace("'", "''"))
334- # Make a list of WHERE conditions for the row header variables
335- sqlcond = ["%s='%s'" % (rownames[i], l[i].replace("'", "''")) for i in range(len(rownames))]
336- for n in xtab_vals:
337- col_counter = col_counter + 1
338- # Add the WHERE conditions for the crosstab values
339- selcond = copy.deepcopy(sqlcond)
340- for cn in range(len(xtab_colnames)):
341- selcond.append("%s='%s'" % (xtab_colnames[cn], n[cn].replace("'", "''")))
342- # Create and execute the SQL to get the data values
343- sqlcmd = "SELECT %s FROM %s WHERE %s" % (",".join(xtab_datanames), tablename, " AND ".join(selcond))
344- if sql_reporter:
345- sql_reporter.log(logging.INFO, "%s" % sqlcmd)
346- # <Debugging>
347- #print(sqlcmd)
348- #return
349- # </Debugging>
350- data_vals = sqldb.execute(sqlcmd).fetchall()
351- if sql_reporter:
352- for r in data_vals:
353- sql_reporter.log(logging.INFO, "\t%s" % "\t".join(r))
354- if len(data_vals) > 1:
355- multiple_vals = True
356- reportable_errors += 1
357- if error_reporter:
358- error_reporter.log(logging.WARNING, "Multiple result rows for the command '%s'--only the first is used." % (sqlcmd))
359- if len(data_vals) == 0:
360- if nullfill:
361- for n in range(len(xtab_datanames)):
362- outstrings.append(nullfill)
363- else:
364- for n in range(len(xtab_datanames)):
365- outstrings.append('')
366- else:
367- data = data_vals[0]
368- for n in range(len(xtab_datanames)):
369- outstrings.append(data[n])
370- csvout.writerow(outstrings)
371- sqldb.close()
372- if file_db and not keep_file_db:
373- try:
374- os.unlink(sqldbname)
375- except:
376- pass
377- outfile.close()
378- if multiple_vals and not error_reporter:
379- msg = "Warning: multiple data values found for at least one crosstab cell; only the first is displayed."
380- print(msg)
381- if error_reporter:
382- error_reporter(msg)
383- return reportable_errors
384-
385-
386-def unquote(str):
387- """Remove quotes surrounding a string."""
388- if len(str) < 2:
389- return str
390- c1 = str[0]
391- c2 = str[-1:]
392- if c1==c2 and (c1=='"' or c1=="'"):
393- return str[1:-1].replace("%s%s" % (c1, c1), c1)
394- return str
395-
396-
397-def quote_str(str):
398- """Add single quotes around a string."""
399- if len(str) == 0:
400- return "''"
401- if len(str) == 1:
402- if str == "'":
403- return "''''"
404- else:
405- return "'%s'" % str
406- if str[0] != "'" or str[-1:] != "'":
407- return "'%s'" % str.replace("'", "''")
408- return str
409-
410-
411-def quote_list(l):
412- """Add single quotes around all strings in the list."""
413- return [quote_str(x) for x in l]
414-
415-
416-def quote_list_as_str(l):
417- """Convert a list of strings to a single string of comma-delimited, quoted tokens."""
418- return ",".join(quote_list(l))
419-
420-
421-def del_file(fn):
422- """Deletes the specified file if it exists."""
423- if os.path.isfile(fn):
424- os.unlink(fn)
425-
426-
427-def copy_to_sqlite(data_fn, sqlite_fn=None, tablename="src"):
428- """Copies data from a CSV file to a sqlite table.
429- Arguments:
430- data_fn: a string of the data file name with the data to be read.
431- sqlite_fn: a string of the name of the sqlite file to create, or None if
432- sqlite is to use memory instead.
433- tablename: the name of the sqlite table to create
434- Value:
435- The sqlite connection object.
436- """
437- dialect = csv.Sniffer().sniff(open(data_fn, "rt").readline())
438- inf = csv.reader(open(data_fn, "rt"), dialect)
439- column_names = inf.next()
440- if sqlite_fn == None:
441- conn = sqlite3.connect(":memory:")
442- else:
443- try:
444- os.unlink(sqlite_fn)
445- except:
446- pass
447- conn = sqlite3.connect(sqlite_fn)
448- if tablename == None:
449- tablename = "src"
450- colstr = ",".join(column_names)
451- try:
452- conn.execute("drop table %s;" % tablename)
453- except:
454- pass
455- conn.execute("create table %s (%s);" % (tablename, colstr))
456- for l in inf:
457- sql = "insert into %s values (%s);" % (tablename, quote_list_as_str(l))
458- conn.execute(sql)
459- conn.commit()
460- return conn
461-
462-
463-def print_help():
464- """Print a program description and brief usage instructions to the console."""
465- print "xtab %s %s -- Cross-tabulates data." % (_version, _vdate)
466- print __help_msg
467-
468-
469-def get_opts(arglist):
470- """Returns a dictionary of command-line arguments. This custom 'getopt' routine is used
471- to allow multiple column names for the -r, -c, and -v arguments with only one use of each
472- flag.
473- """
474- argdict = {}
475- nargs = len(arglist)
476- argno = 1
477- currarg = None
478- currargitems = []
479- while argno < nargs:
480- arg = arglist[argno]
481- if len(arg) > 0:
482- if arg[0] == '-':
483- if currarg:
484- argdict[currarg] = currargitems
485- currarg = arg
486- currargitems = []
487- else:
488- if currarg:
489- currargitems.append(arg)
490- else:
491- argdict[arg] = []
492- argno += 1
493- if currarg:
494- argdict[currarg] = currargitems
495- return argdict
496-
497-
498-def main():
499- """Read and interpret the command-line arguments and options, and carry out
500- the appropriate actions."""
501- args = get_opts(sys.argv)
502- if len(args) == 0 or args.has_key('-h') or args.has_key('--help'):
503- print_help()
504- sys.exit(0)
505- badopts = [ o for o in args.keys() if o not in ['-i', '-o', '-r',
506- '-c', '-v', '-d', '-d1', '-d2', '-d3', '-d4', '-f', '-k', '-s', '-t',
507- '-n', '-e', '-q'] ]
508- if len(badopts) > 0:
509- raise ValueError, "Unrecognized option(s): %s" % ", ".join(badopts)
510- if args.has_key('-i'):
511- if len(args['-i']) == 0:
512- raise ValueError, _errmsg_noinfile
513- infilename = args['-i'][0]
514- if not os.path.exists(infilename):
515- raise ValueError, "%s (%s)" % (_errmsg_badinfile, infilename)
516- else:
517- raise ValueError, _errmsg_noinfile
518- #
519- if args.has_key('-o'):
520- if len(args['-o']) == 0:
521- raise ValueError, _errmsg_nooutfile
522- outfilename = args['-o'][0]
523- else:
524- raise ValueError, _errmsg_nooutfile
525- #
526- if args.has_key('-r'):
527- if len(args['-r']) == 0:
528- raise ValueError, _errmsg_norowheaders
529- rowheaders = args['-r']
530- else:
531- raise ValueError, _errmsg_norowheaders
532- #
533- if args.has_key('-c'):
534- if len(args['-c']) == 0:
535- raise ValueError, _errmsg_nocolumheaders
536- columnheaders = args['-c']
537- else:
538- raise ValueError, _errmsg_nocolumheaders
539- #
540- if args.has_key('-v'):
541- if len(args['-v']) == 0:
542- raise ValueError, _errmsg_nocellcolumns
543- cellvalues = args['-v']
544- else:
545- raise ValueError, _errmsg_nocellcolumns
546- #
547- hdr_opt = hdrs_1
548- if args.has_key('-d') or args.has_key('-d2'):
549- hdr_opt = hdrs_2
550- if args.has_key('-d3'):
551- hdr_opt = hdrs_many
552- if args.has_key('-d4'):
553- hdr_opt = hdrs_labeled
554- file_db = args.has_key('-f')
555- keep_file_db = args.has_key('-k')
556- tablename = 'src'
557- if args.has_key('-t'):
558- if len(args['-t']) == 1:
559- tablename = args['-t'][0]
560- nullfill = None
561- if args.has_key('-n'):
562- if len(args['-n']) == 1:
563- nullfill = args['-n'][0]
564- sort_alpha = args.has_key('-s')
565- #
566- # Set up logging
567- #logging.basicConfig(level=logging.INFO, filemode="w", filename='')
568- err_logger = None
569- sql_logger = None
570- error_file = None
571- if args.has_key('-e'):
572- err_logger = logging.getLogger("err")
573- err_logger.setLevel(logging.WARNING)
574- if len(args['-e']) == 0:
575- err_logger.addHandler(logging.StreamHandler())
576- else:
577- if len(args['-e']) > 1:
578- raise ValueError, _errmsg_baderrlogfile
579- error_file = args['-e'][0]
580- del_file(error_file)
581- file_logger = logging.FileHandler(error_file, "w")
582- err_logger.addHandler(file_logger)
583- if args.has_key('-q'):
584- if len(args['-q']) <> 1:
585- raise ValueError, _errmsg_badsqllogfile
586- sql_logger = logging.getLogger("sql")
587- sql_logger.setLevel(logging.INFO)
588- sql_logger.addHandler(logging.FileHandler(args['-q'][0], "w"))
589- #
590- errors = xtab(infilename, rowheaders, columnheaders, cellvalues, outfilename, hdr_opt,
591- file_db, keep_file_db, tablename, err_logger, sql_logger, nullfill, sort_alpha)
592- if errors == 0 and error_file is not None:
593- # Logger can create the file if a message below the warning level
594- # is issued, even though it will not be logged.
595- file_logger.close()
596- del_file(error_file)
597-
598-
599-
600-if __name__=='__main__':
601- try:
602- main()
603- except SystemExit, x:
604- sys.exit(x)
605- except ValueError, e:
606- sys.stderr.write("%s\n" % str(e))
607- sys.exit(1)
608- except Exception:
609- strace = traceback.extract_tb(sys.exc_info()[2])[-1:]
610- lno = strace[0][1]
611- src = strace[0][3]
612- sys.stderr.write("%s: Uncaught exception %s (%s) on line %s (%s)." % (os.path.basename(sys.argv[0]), str(sys.exc_info()[0]), sys.exc_info()[1], lno, src))
613- sys.exit(1)
614-
diff -r fbcf053ca74c -r a573cf923d27 xtab/xtab.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/xtab/xtab.py Thu Aug 23 20:47:33 2018 -0700
@@ -0,0 +1,614 @@
1+#! /usr/bin/python
2+"""xtab.py
3+
4+PURPOSE
5+ Read a table (from a text file) of data in normalized form and cross-tab it,
6+ allowing multiple data columns to be crosstabbed.
7+
8+AUTHOR
9+ R. Dreas Nielsen (RDN)
10+
11+COPYRIGHT AND LICENSE
12+ Copyright (c) 2007-2018, R.Dreas Nielsen
13+ This program is free software: you can redistribute it and/or modify
14+ it under the terms of the GNU General Public License as published by
15+ the Free Software Foundation, either version 3 of the License, or
16+ (at your option) any later version.
17+ This program is distributed in the hope that it will be useful,
18+ but WITHOUT ANY WARRANTY; without even the implied warranty of
19+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20+ GNU General Public License for more details.
21+ The GNU General Public License is available at <http://www.gnu.org/licenses/>
22+
23+NOTES
24+ 1. This code can be used either as a module or as a stand-alone script.
25+ 2. The sole function intended to be used by callers of this module is 'xtab()'.
26+ 3. When there are multiple values in the input that should go into a single
27+ cell of the output, only the first of these is written into that cell
28+ ('first' is indeterminate). The 'xtab()' function allows logging of
29+ the data selection statement (SQL) used to obtain the data for each cell,
30+ and the result(s) obtained, and thus to determine which cell(s) have
31+ multiple values.
32+
33+HISTORY
34+ ---------- ------------------------------------------------------------------------------
35+ Date Revisions
36+ ---------- ------------------------------------------------------------------------------
37+ 9/12/2007 Created; incomplete. RDN.
38+ 10/27/2007 Finished full functionality of 'xtab()'. Usable as module or script. RDN.
39+ 10/28/2007 Cleaned up code and documentation. Improvements to be made:
40+ 1. Implement a class to wrap csv reader and writer objects
41+ to ensure appropriate closure of underlying file objects
42+ on error.
43+ 2. Implement a class (object) for the input file to simplify
44+ format evaluation and opening (i.e., reduce multiple opens).
45+ 3. Provide a default log-writing handler, and possibly an additional
46+ command-line argument consisting of an output filename for
47+ the log.
48+ 4. Implement error-checking of the column names in the command
49+ line arguments prior to calling 'xtab()' (do #2 above first).
50+ 5. Add more specific error traps throughout.
51+ RDN.
52+ 1/13/2008 Revised to use the csv module's format sniffer instead of custom
53+ code in this module. This addresses item 2 in the 10/28/2007 list.
54+ Revised so that the sqlite db is created in memory by default, and a
55+ temporary file is created only if specified by the (new) '-f'
56+ command-line option. RDN.
57+ 6/14/2008 Doubled apostrophes in string data values. RDN.
58+ 12/20/2008 Added options to keep the sqlite file, to specifiy the sqlite table
59+ name, and to log both error messages and SQL commands. Bumped version
60+ number to 0.7. RDN.
61+ 10/19/2011 Added the "-n" option, allowing null values to be replaced in the
62+ output with some user-specified string. Version 0.8. RDN
63+ 10/20/2011 Added the "-d3" and "-d4" options to allow other types of header
64+ row output. Version 0.9. RDN.
65+ 02/08/2010 Corrected tests for -f and -k options. Version 0.9.1. RDN.
66+ 2018-02-15 Modified to sort column and row header values in ascending order. RDN.
67+ 2018-03-05 Modified so that sorting is controlled by a new option, "-s". RDN.
68+ 2018-08-21 Modified to delete the error message file if it was created but
69+ no errors occurred. RDN.
70+ 2018-08-21 Closed the error logger if an empty error log file is to be deleted. RDN.
71+=====================================================================================
72+"""
73+
74+#=====================================================================================
75+# TODO:
76+# * Implement a class to wrap csv reader and writer objects
77+# to ensure appropriate closure of underlying file objects
78+# on error. (Or use 'with' in 2.6/3.0)
79+# * Implement error-checking of the column names in the command
80+# line arguments prior to calling 'xtab()'.
81+# * Add more specific error traps throughout.
82+#=====================================================================================
83+
84+
85+_version = "0.10.1.0"
86+_vdate = "2018-03-05"
87+
88+import sys
89+import os
90+import os.path
91+import csv
92+import sqlite3
93+import copy
94+import logging
95+import traceback
96+
97+_errmsg_noinfile = "No input filename specified."
98+_errmsg_badinfile = "Input file does not exist."
99+_errmsg_nooutfile = "No output filename specified."
100+_errmsg_norowheaders = "No row header columns specified."
101+_errmsg_nocolumheaders = "No column header columns specified."
102+_errmsg_nocellcolumns = "No cell value columns specified."
103+_errmsg_baderrlogfile = "Only one error log file name should be specified."
104+_errmsg_badsqllogfile = "Only one SQL log file name should be specified."
105+
106+__help_msg = """Required Arguments:
107+ -i <filename>
108+ The name of the input file from which to read data.
109+ This must be a text file, with data in a normalized format.
110+ The first line of the file must contain column names.
111+ -o <filename>
112+ The name of the output file to create.
113+ The output file will be created as a .csv file.
114+ -r <column_name1> [column_name2 [...]]
115+ One or more column names to use as row headers.
116+ Unique values of these columns will appear at the beginning of every
117+ output line.
118+ -c <column_name1> [column_name2 [...]]
119+ One or more column names to use as column headers in the output.
120+ A crosstab column (or columns) will be created for every unique
121+ combination of values of these fields in the input.
122+ -v <column_name1> [column_name2 [...]]
123+ One or more column names with values to be used to fill the cells
124+ of the cross-table. If n columns names are specified, then there will
125+ be n columns in the output table for each of the column headers
126+ corresponding to values of the -c argument. The column names specified
127+ with the -v argument will be appended to the output column headers
128+ created from values of the -c argument. There should be only one value
129+ of the -v column(s) for each combination of the -r and -c columns;
130+ if there is more than one, a warning will be printed and only the first
131+ value will appear in the output. (That is, values are not combined in
132+ any way when there are multiple values for each output cell.)
133+Optional Arguments:
134+ -d[1|2|3|4]
135+ Controls the format of column headers. The four alternatives are:
136+ -d1 or no option specified
137+ One row of column headers, with elements joined by underscores
138+ to facilitate parsing by other programs.
139+ -d or -d2
140+ Two rows of column headers. The first row contains values of the
141+ columns specified by the -c argument, and the second row contains
142+ the column names specified by the -v argument.
143+ -d3
144+ One header row for each of the values of the columns specified by
145+ the -c argument, plus one row with the column names specified by
146+ the -v argument.
147+ -d4
148+ Like -d3, but the values of the columns specified by the -c
149+ argument are labeled with (preceded by) the column names.
150+ -f
151+ Use a temporary (sqlite) file instead of memory for intermediate
152+ storage.
153+ -k
154+ Keep (i.e., do not delete) the sqlite file. Only useful with the
155+ "-f" option. Unless the "-t" option is also used, the table name will
156+ be "src".
157+ -n <default_string>
158+ Use the specified default string in the output wherever an empty or
159+ null value would otherwise appear.
160+ -s Sort columns and rows in ascending alphabetic order.
161+ -t <tablename>
162+ Name to use for the table in the intermediate sqlite database. Only
163+ useful with the "-f" and "-k" options.
164+ -e [filename]
165+ Log all error messages, to a file if the filename is specified or to the
166+ console if the filename is not specified.
167+ -q <filename>
168+ Log the sequence of SQL commands used to extract data from the input
169+ file to write the output file, including the result of each command.
170+ -h
171+ Print this help and exit.
172+Notes:
173+ 1. Column names should be specified in the same case as they appear in the
174+ input file.
175+ 2. The -f option creates a temporary file in the same directory as the
176+ output file. This file has the same name as the input file, but an
177+ extension of '.sqlite'.
178+ 3. There are no inherent limits to the number of rows or columns in the
179+ input or output files.
180+ 4. Missing required arguments will result in an exception rather than an
181+ error message, whatever the error logging option. If no error logging
182+ option is specified, then if there are multiple values per cell (the
183+ most likely data error), a single message will be printed on the console.
184+"""
185+
186+# enum for header row codes
187+hdrs_1, hdrs_2, hdrs_many, hdrs_labeled = (1, 2, 3, 4)
188+
189+def xtab(infilename, rownames, xtab_colnames, xtab_datanames, outfilename,
190+ header_rows=hdrs_1, file_db=False, keep_file_db=False, tablename="src",
191+ error_reporter=None, sql_reporter=None, nullfill=None, sort_alpha=False):
192+ """Cross-tab data in the specified input file and write it to the output file.
193+ Arguments:
194+ infilename: string of the input file name. Some diagnosis of file format
195+ (CSV or tab formatted) will be performed.
196+ rownames: list of strings of column names in the input file that will be used
197+ as row headers in the output file.
198+ xtab_colnames: list of strings of column names in the input file that will be
199+ used as primary column headers in the output file.
200+ xtab_datanames: list of strings of column names in the input file that will be
201+ crosstabbed in the output file. These column names will also be used as
202+ secondary column names in the output file.
203+ outfilename: string of the output file name. This file will all be written as CSV.
204+ dualheader: boolean controlling whether or not there will be one or two header
205+ rows in the output file. If a single header row is used, then the primary and
206+ secondary column headers will be joined in each column header. If two column
207+ headers are used, then the primary column headers will be used on the first
208+ line of headers, and the secondary column headers will be used on the second
209+ line of headers.
210+ file_db: boolean controlling whether or not the sqlite db is created as a disk file
211+ (if True) or in memory (if False, the default).
212+ keep_file_db: boolean controlling whether or not a sqlite disk file is retained
213+ (if True) or deleted after it has been used.
214+ error_reporter: logging.Logger object to report nonfatal errors (specifically, the presence
215+ of more than one value for a cell).
216+ sql_reporter: logging.Logger object to report the sqlite queries executed and their
217+ results.
218+ nullfill: the value with which to replace null (empty) values in the output.
219+ Return value
220+ The number of warnings or errors encountere.
221+
222+ When multiple column headers in the input file are used as a single column header in
223+ the output file, the column names are joined with an underscore. This is to facilitate
224+ any subsequent parsing to be done by other programs (e.g., R).
225+ """
226+ multiple_vals = False # A flag indicating whether or not multiple values were found for a single crosstab cell
227+ outfile = open(outfilename, "wb") # The csv module adds an extra <CR> if "wt" is specified
228+ csvout = csv.writer(outfile)
229+ reportable_errors = 0
230+
231+ # Move the data into sqlite for easy random access.
232+ if file_db:
233+ inhead, intail = os.path.split(infilename)
234+ sqldbname = os.path.join(inhead, os.path.splitext(intail)[0] + ".sqlite")
235+ try:
236+ os.unlink(sqldbname)
237+ except:
238+ pass
239+ else:
240+ sqldbname = None
241+ if tablename == None:
242+ tablename = "src"
243+ sqldb = copy_to_sqlite(infilename, sqldbname, tablename)
244+
245+ # Get list of unique values for 'xtab_colnames' columns
246+ if sort_alpha:
247+ sqlcmd = "select distinct %s from %s order by %s;" % (",".join(xtab_colnames), tablename, ",".join(xtab_colnames))
248+ else:
249+ sqlcmd = "select distinct %s from %s;" % (",".join(xtab_colnames), tablename)
250+ xtab_vals = sqldb.execute(sqlcmd).fetchall()
251+
252+ # Write output headers.
253+ if header_rows == hdrs_1:
254+ # One header row
255+ outstrings = [n for n in rownames]
256+ for n in xtab_vals:
257+ for i in xtab_datanames:
258+ outstrings.append("%s_%s" % ("_".join(n), i.replace("'", "''")))
259+ csvout.writerow(outstrings)
260+ elif header_rows == hdrs_2:
261+ # Two header rows
262+ extra_cols = len(xtab_datanames) - 1
263+ # Write header row 1/2
264+ outstrings = ['' for n in rownames]
265+ for n in xtab_vals:
266+ hdr = " ".join(n)
267+ outstrings.append(hdr.replace("'", "''"))
268+ for i in range(extra_cols):
269+ outstrings.append('')
270+ csvout.writerow(outstrings)
271+ # Write header row 2/2
272+ outstrings = [n for n in rownames]
273+ for n in xtab_vals:
274+ for i in xtab_datanames:
275+ outstrings.append(i.replace("'", "''"))
276+ csvout.writerow(outstrings)
277+ elif header_rows == hdrs_many:
278+ # One header row for each item in xtab_vals plus a row
279+ # for xtab_datanames.
280+ extra_cols = len(xtab_datanames) - 1
281+ for i in range(len(xtab_colnames)):
282+ outstrings = ['' for n in rownames]
283+ for n in xtab_vals:
284+ outstrings.append(n[i].replace("'", "''"))
285+ for x in range(extra_cols):
286+ outstrings.append('')
287+ csvout.writerow(outstrings)
288+ # Write last header row of xtab_datanames
289+ outstrings = [n for n in rownames]
290+ for n in xtab_vals:
291+ for i in xtab_datanames:
292+ outstrings.append(i.replace("'", "''"))
293+ csvout.writerow(outstrings)
294+ else: # header_rows == hdrs_labeled
295+ extra_cols = len(xtab_datanames) - 1
296+ for i in range(len(xtab_colnames)):
297+ outstrings = ['' for n in rownames]
298+ for n in xtab_vals:
299+ outstrings.append("%s: %s" % (xtab_colnames[i], n[i].replace("'", "''")))
300+ for x in range(extra_cols):
301+ outstrings.append('')
302+ csvout.writerow(outstrings)
303+ # Write last header row of xtab_datanames
304+ outstrings = [n for n in rownames]
305+ for n in xtab_vals:
306+ for i in xtab_datanames:
307+ outstrings.append(i.replace("'", "''"))
308+ csvout.writerow(outstrings)
309+
310+ # Write output data
311+ # For each unique combination of row headers
312+ # Initiate a new output line
313+ # Get the row headers
314+ # For every item in the xtab_vals
315+ # Select the 'xtab_datanames' columns from sqlite for the row headers and xtab_vals
316+ # Append the first result (set warning if >1) to the output line
317+ # Write the output line
318+ #
319+ # Get a list of unique combinations of row headers
320+ if sort_alpha:
321+ sqlcmd = "SELECT DISTINCT %s FROM %s ORDER BY %s;" % (",".join(rownames), tablename, ",".join(rownames))
322+ else:
323+ sqlcmd = "SELECT DISTINCT %s FROM %s;" % (",".join(rownames), tablename)
324+ row_hdr_vals = sqldb.execute(sqlcmd).fetchall()
325+ row_counter = 0
326+
327+ for l in row_hdr_vals:
328+ row_counter = row_counter + 1
329+ col_counter = 0
330+ outstrings = []
331+ # Add the row headers to the list of outstrings
332+ for rn in range(len(l)):
333+ outstrings.append(l[rn].replace("'", "''"))
334+ # Make a list of WHERE conditions for the row header variables
335+ sqlcond = ["%s='%s'" % (rownames[i], l[i].replace("'", "''")) for i in range(len(rownames))]
336+ for n in xtab_vals:
337+ col_counter = col_counter + 1
338+ # Add the WHERE conditions for the crosstab values
339+ selcond = copy.deepcopy(sqlcond)
340+ for cn in range(len(xtab_colnames)):
341+ selcond.append("%s='%s'" % (xtab_colnames[cn], n[cn].replace("'", "''")))
342+ # Create and execute the SQL to get the data values
343+ sqlcmd = "SELECT %s FROM %s WHERE %s" % (",".join(xtab_datanames), tablename, " AND ".join(selcond))
344+ if sql_reporter:
345+ sql_reporter.log(logging.INFO, "%s" % sqlcmd)
346+ # <Debugging>
347+ #print(sqlcmd)
348+ #return
349+ # </Debugging>
350+ data_vals = sqldb.execute(sqlcmd).fetchall()
351+ if sql_reporter:
352+ for r in data_vals:
353+ sql_reporter.log(logging.INFO, "\t%s" % "\t".join(r))
354+ if len(data_vals) > 1:
355+ multiple_vals = True
356+ reportable_errors += 1
357+ if error_reporter:
358+ error_reporter.log(logging.WARNING, "Multiple result rows for the command '%s'--only the first is used." % (sqlcmd))
359+ if len(data_vals) == 0:
360+ if nullfill:
361+ for n in range(len(xtab_datanames)):
362+ outstrings.append(nullfill)
363+ else:
364+ for n in range(len(xtab_datanames)):
365+ outstrings.append('')
366+ else:
367+ data = data_vals[0]
368+ for n in range(len(xtab_datanames)):
369+ outstrings.append(data[n])
370+ csvout.writerow(outstrings)
371+ sqldb.close()
372+ if file_db and not keep_file_db:
373+ try:
374+ os.unlink(sqldbname)
375+ except:
376+ pass
377+ outfile.close()
378+ if multiple_vals and not error_reporter:
379+ msg = "Warning: multiple data values found for at least one crosstab cell; only the first is displayed."
380+ print(msg)
381+ if error_reporter:
382+ error_reporter(msg)
383+ return reportable_errors
384+
385+
386+def unquote(str):
387+ """Remove quotes surrounding a string."""
388+ if len(str) < 2:
389+ return str
390+ c1 = str[0]
391+ c2 = str[-1:]
392+ if c1==c2 and (c1=='"' or c1=="'"):
393+ return str[1:-1].replace("%s%s" % (c1, c1), c1)
394+ return str
395+
396+
397+def quote_str(str):
398+ """Add single quotes around a string."""
399+ if len(str) == 0:
400+ return "''"
401+ if len(str) == 1:
402+ if str == "'":
403+ return "''''"
404+ else:
405+ return "'%s'" % str
406+ if str[0] != "'" or str[-1:] != "'":
407+ return "'%s'" % str.replace("'", "''")
408+ return str
409+
410+
411+def quote_list(l):
412+ """Add single quotes around all strings in the list."""
413+ return [quote_str(x) for x in l]
414+
415+
416+def quote_list_as_str(l):
417+ """Convert a list of strings to a single string of comma-delimited, quoted tokens."""
418+ return ",".join(quote_list(l))
419+
420+
421+def del_file(fn):
422+ """Deletes the specified file if it exists."""
423+ if os.path.isfile(fn):
424+ os.unlink(fn)
425+
426+
427+def copy_to_sqlite(data_fn, sqlite_fn=None, tablename="src"):
428+ """Copies data from a CSV file to a sqlite table.
429+ Arguments:
430+ data_fn: a string of the data file name with the data to be read.
431+ sqlite_fn: a string of the name of the sqlite file to create, or None if
432+ sqlite is to use memory instead.
433+ tablename: the name of the sqlite table to create
434+ Value:
435+ The sqlite connection object.
436+ """
437+ dialect = csv.Sniffer().sniff(open(data_fn, "rt").readline())
438+ inf = csv.reader(open(data_fn, "rt"), dialect)
439+ column_names = inf.next()
440+ if sqlite_fn == None:
441+ conn = sqlite3.connect(":memory:")
442+ else:
443+ try:
444+ os.unlink(sqlite_fn)
445+ except:
446+ pass
447+ conn = sqlite3.connect(sqlite_fn)
448+ if tablename == None:
449+ tablename = "src"
450+ colstr = ",".join(column_names)
451+ try:
452+ conn.execute("drop table %s;" % tablename)
453+ except:
454+ pass
455+ conn.execute("create table %s (%s);" % (tablename, colstr))
456+ for l in inf:
457+ sql = "insert into %s values (%s);" % (tablename, quote_list_as_str(l))
458+ conn.execute(sql)
459+ conn.commit()
460+ return conn
461+
462+
463+def print_help():
464+ """Print a program description and brief usage instructions to the console."""
465+ print "xtab %s %s -- Cross-tabulates data." % (_version, _vdate)
466+ print __help_msg
467+
468+
469+def get_opts(arglist):
470+ """Returns a dictionary of command-line arguments. This custom 'getopt' routine is used
471+ to allow multiple column names for the -r, -c, and -v arguments with only one use of each
472+ flag.
473+ """
474+ argdict = {}
475+ nargs = len(arglist)
476+ argno = 1
477+ currarg = None
478+ currargitems = []
479+ while argno < nargs:
480+ arg = arglist[argno]
481+ if len(arg) > 0:
482+ if arg[0] == '-':
483+ if currarg:
484+ argdict[currarg] = currargitems
485+ currarg = arg
486+ currargitems = []
487+ else:
488+ if currarg:
489+ currargitems.append(arg)
490+ else:
491+ argdict[arg] = []
492+ argno += 1
493+ if currarg:
494+ argdict[currarg] = currargitems
495+ return argdict
496+
497+
498+def main():
499+ """Read and interpret the command-line arguments and options, and carry out
500+ the appropriate actions."""
501+ args = get_opts(sys.argv)
502+ if len(args) == 0 or args.has_key('-h') or args.has_key('--help'):
503+ print_help()
504+ sys.exit(0)
505+ badopts = [ o for o in args.keys() if o not in ['-i', '-o', '-r',
506+ '-c', '-v', '-d', '-d1', '-d2', '-d3', '-d4', '-f', '-k', '-s', '-t',
507+ '-n', '-e', '-q'] ]
508+ if len(badopts) > 0:
509+ raise ValueError, "Unrecognized option(s): %s" % ", ".join(badopts)
510+ if args.has_key('-i'):
511+ if len(args['-i']) == 0:
512+ raise ValueError, _errmsg_noinfile
513+ infilename = args['-i'][0]
514+ if not os.path.exists(infilename):
515+ raise ValueError, "%s (%s)" % (_errmsg_badinfile, infilename)
516+ else:
517+ raise ValueError, _errmsg_noinfile
518+ #
519+ if args.has_key('-o'):
520+ if len(args['-o']) == 0:
521+ raise ValueError, _errmsg_nooutfile
522+ outfilename = args['-o'][0]
523+ else:
524+ raise ValueError, _errmsg_nooutfile
525+ #
526+ if args.has_key('-r'):
527+ if len(args['-r']) == 0:
528+ raise ValueError, _errmsg_norowheaders
529+ rowheaders = args['-r']
530+ else:
531+ raise ValueError, _errmsg_norowheaders
532+ #
533+ if args.has_key('-c'):
534+ if len(args['-c']) == 0:
535+ raise ValueError, _errmsg_nocolumheaders
536+ columnheaders = args['-c']
537+ else:
538+ raise ValueError, _errmsg_nocolumheaders
539+ #
540+ if args.has_key('-v'):
541+ if len(args['-v']) == 0:
542+ raise ValueError, _errmsg_nocellcolumns
543+ cellvalues = args['-v']
544+ else:
545+ raise ValueError, _errmsg_nocellcolumns
546+ #
547+ hdr_opt = hdrs_1
548+ if args.has_key('-d') or args.has_key('-d2'):
549+ hdr_opt = hdrs_2
550+ if args.has_key('-d3'):
551+ hdr_opt = hdrs_many
552+ if args.has_key('-d4'):
553+ hdr_opt = hdrs_labeled
554+ file_db = args.has_key('-f')
555+ keep_file_db = args.has_key('-k')
556+ tablename = 'src'
557+ if args.has_key('-t'):
558+ if len(args['-t']) == 1:
559+ tablename = args['-t'][0]
560+ nullfill = None
561+ if args.has_key('-n'):
562+ if len(args['-n']) == 1:
563+ nullfill = args['-n'][0]
564+ sort_alpha = args.has_key('-s')
565+ #
566+ # Set up logging
567+ #logging.basicConfig(level=logging.INFO, filemode="w", filename='')
568+ err_logger = None
569+ sql_logger = None
570+ error_file = None
571+ if args.has_key('-e'):
572+ err_logger = logging.getLogger("err")
573+ err_logger.setLevel(logging.WARNING)
574+ if len(args['-e']) == 0:
575+ err_logger.addHandler(logging.StreamHandler())
576+ else:
577+ if len(args['-e']) > 1:
578+ raise ValueError, _errmsg_baderrlogfile
579+ error_file = args['-e'][0]
580+ del_file(error_file)
581+ file_logger = logging.FileHandler(error_file, "w")
582+ err_logger.addHandler(file_logger)
583+ if args.has_key('-q'):
584+ if len(args['-q']) <> 1:
585+ raise ValueError, _errmsg_badsqllogfile
586+ sql_logger = logging.getLogger("sql")
587+ sql_logger.setLevel(logging.INFO)
588+ sql_logger.addHandler(logging.FileHandler(args['-q'][0], "w"))
589+ #
590+ errors = xtab(infilename, rowheaders, columnheaders, cellvalues, outfilename, hdr_opt,
591+ file_db, keep_file_db, tablename, err_logger, sql_logger, nullfill, sort_alpha)
592+ if errors == 0 and error_file is not None:
593+ # Logger can create the file if a message below the warning level
594+ # is issued, even though it will not be logged.
595+ file_logger.close()
596+ del_file(error_file)
597+
598+
599+
600+if __name__=='__main__':
601+ try:
602+ main()
603+ except SystemExit, x:
604+ sys.exit(x)
605+ except ValueError, e:
606+ sys.stderr.write("%s\n" % str(e))
607+ sys.exit(1)
608+ except Exception:
609+ strace = traceback.extract_tb(sys.exc_info()[2])[-1:]
610+ lno = strace[0][1]
611+ src = strace[0][3]
612+ sys.stderr.write("%s: Uncaught exception %s (%s) on line %s (%s)." % (os.path.basename(sys.argv[0]), str(sys.exc_info()[0]), sys.exc_info()[1], lno, src))
613+ sys.exit(1)
614+
Show on old repository browser