Hello abap2xlsx team,
since years we're using abap2xlsx to export and also import data to our CRM system. But now we faced the problem that this error occurred when opening the generated .xlsx:
Removed Part: /xl/sharedStrings.xml part with XML error. (Strings) Illegal xml character.
I've started investigating and with the help of Firefox who marks clearly where a XML file has errors I discovered that the file contains the Character for "Thumbs up" (Unicode character inspector: ). I've tracked down the problem further and found that it's not the fault of abap2xlsx. It seems to me to be a problem how the ixml class does the encoding conversion when UTF-8 Encoding is specified. Because when I set the encoding to UTF-16LE, then the character is embedded in the XML file correctly and the file is valid.
I've started to search for SAP Notes in the application area BC-ABA-XML which is maintained for the package SIXML where the CL_IXML class is assigned. I've found:
1559677 - XML renderer generates invalid XML
1750204 - iXML: Treatment of special and invalid characters
To check if the character is an invalid character I've implemented the provided code of note 1559677 in the following demo report. The report creates the XML using ixml but also manually. When the report is executed with Encoding UTF-8 then the XML generated by ixml is invalid. But when you execute using UTF-16, then also the ixml generated XML is valid.
REPORT zdemo_excel1_unicode_xml.
DATA: lv_string TYPE string, lv_xml TYPE string, lv_xml_x TYPE xstring, lv_encoding TYPE abap_encoding, lv_xsting TYPE xstring, lv_content TYPE xstring.
PARAMETERS: p_chars TYPE string DEFAULT 'UTF-16'. " Export is OK for UTF-16, but try with UTF-8
" Class from Note 1559677 - XML renderer generates invalid XML
CLASS lcl_replace_chars DEFINITION FINAL. PUBLIC SECTION. CLASS-METHODS replace_invalid_xml_chars CHANGING c_string TYPE string. CLASS-METHODS class_constructor. PRIVATE SECTION. CLASS-DATA ctrls TYPE string. CLASS-DATA replc TYPE string VALUE ` `.
ENDCLASS.
CLASS lcl_replace_chars IMPLEMENTATION. METHOD replace_invalid_xml_chars. TRANSLATE c_string USING ctrls. ENDMETHOD. METHOD class_constructor. DO 32 TIMES. CHECK NOT ( sy-index = 10 OR sy-index = 11 OR sy-index = 14 ). ctrls = |{ ctrls }{ cl_abap_conv_in_ce=>uccpi( sy-index - 1 ) }| & |{ replc }|. ENDDO. ENDMETHOD.
ENDCLASS.
START-OF-SELECTION. " the following HEX sequence is the UTF-16LE representation for the text " Tumbs up + U+1F44D http://graphemica.com/%F0%9F%91%8D lv_xsting = '5400680075006D006200730020007500700020003DD84DDC'. " Convert the HEX sequence into a string TRY. lv_encoding = '4103'. " utf-16le cl_abap_conv_in_ce=>create( EXPORTING encoding = lv_encoding " Input Character Format input = lv_xsting RECEIVING conv = DATA(lr_conv) " New Converter Instance ). lr_conv->read( IMPORTING data = lv_string " Data Object To Be Read ). CATCH cx_sy_conversion_codepage. CATCH cx_sy_codepage_converter_init. CATCH cx_parameter_invalid_type. CATCH cx_parameter_invalid_range. ENDTRY. " Create XML using IXML DATA(lo_ixml) = cl_ixml=>create( ). DATA(lo_encoding) = lo_ixml->create_encoding( byte_order = if_ixml_encoding=>co_platform_endian character_set = p_chars ). DATA(lo_document) = lo_ixml->create_document( ). lo_document->set_encoding( lo_encoding ). lo_document->set_standalone( abap_true ). DATA(lo_element_root) = lo_document->create_simple_element( name = 'demo' parent = lo_document ). lo_element_root->set_value( value = lv_string ). " Create xstring stream DATA(lo_streamfactory) = lo_ixml->create_stream_factory( ). DATA(lo_ostream) = lo_streamfactory->create_ostream_xstring( string = lv_content ). DATA(lo_renderer) = lo_ixml->create_renderer( ostream = lo_ostream document = lo_document ). lo_renderer->render( ). " set a breakpoint here and use the View "XML-Browser" to display the content of lv_content WRITE: / lv_content. " Create XML by hand and convert to UTF-8 CONCATENATE '<demo>' lv_string '</demo>' INTO lv_xml. CONCATENATE '<?xml version="1.0" encoding="utf-8" standalone="yes" ?>' lv_xml INTO lv_xml SEPARATED BY cl_abap_char_utilities=>cr_lf. DATA(lr_conv_out) = cl_abap_conv_out_ce=>create( EXPORTING encoding = 'UTF-8' " Output Character Format ). lr_conv_out->write( EXPORTING data = lv_xml ). lv_xml_x = lr_conv_out->get_buffer( ). " set a breakpoint here and use the View "XML-Browser" to display the content of lv_xml_x WRITE: lv_xml_x. " Test Method from Note 1559677 - XML renderer generates invalid XML " to remove invalid characters WRITE: / 'Before:', lv_string. lcl_replace_chars=>replace_invalid_xml_chars( CHANGING c_string = lv_string ). WRITE: / 'After:', lv_string.Can you cross-check my investigation and the report. If you also think it's a bug that SAP should fix I will raise the Incident.
Best regards
Gregor