Discussion:
[jira] Created: (MIME4J-52) Infinite loop when nested multipart is missing end boundary
Niklas Therning (JIRA)
2008-07-14 08:25:33 UTC
Permalink
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------

Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Fix For: 0.4


I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Niklas Therning (JIRA)
2008-07-14 08:27:32 UTC
Permalink
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Niklas Therning updated MIME4J-52:
----------------------------------

Attachment: 36387089_3.msg
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Stefano Bagnara (JIRA)
2008-07-14 09:45:31 UTC
Permalink
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613275#action_12613275 ]

Stefano Bagnara commented on MIME4J-52:
---------------------------------------

For the record the loops also happen with a simplified message:
----
Content-Type: multipart/mixed; boundary="outer-boundary"

Outer preamble
----

So the loop happens when an multipart is declared but it never starts.

The event sequence for this message is:

0 T_START_MESSAGE
3 T_START_HEADER
4 T_FIELD
5 T_END_HEADER
6 T_START_MULTIPART
8 T_PREAMBLE
----- Loop ----
10 T_START_BODYPART
3 T_START_HEADER
- Unexpected end of headers detected. Higher level boundary detected or EOF reached.
- Invalid header encountered
5 T_END_HEADER
12 T_BODY
11 T_END_BODYPART
-2 T_IN_BODYPART (not exposed)
----- End Loop ----

I also found that advancing from T_START_MULTIPART to T_PREABLE makes a call to MimeBoundaryInputStream.skipBoundary with this buffer:
[pos: 0][limit: 16][Outer preamble
]

It happend that MimeStreamParserTest$TestHandler.preamble is called and the first in.read called by the preamble test method (reading the preamble stream) calls the BufferingInputStreamAdaptor.read that in turn calls the MimeBoundaryInputStream.read that at this point already has endOfStream() true and hasData() false.

The bug seems to be there: we have MimeBoundaryInputStream.limit 16 and MimeBoundaryInputStream.buffer.limit 16 so hasData returns false.

ATM I tried changing the hasData from:
return limit > buffer.pos() && limit < buffer.limit();
to
return limit > buffer.pos() && limit <= buffer.limit();

and this seems to fix the issue.
But this is purely TDD because I don't know what I did :-)

I gave a go to the original message you attached but I see the following result
--------
<message>
<header>
<field>
Content-Type: multipart/mixed; boundary="outer-boundary"</field>
</header>
<multipart>
<preamble>
Outer preamble
</preamble>
<body-part>
<header>
<field>
Content-Type: text/plain</field>
</header>
<body>
Foo
</body>
</body-part>
<body-part>
<header>
<field>
Content-Type: multipart/alternative; boundary="inner-boundary"</field>
</header>
<multipart>
<preamble>
AAA

--outer-boundary--
Outer epilouge
</preamble>
<body-part>
<header>
</header>
<body>
</body>
</body-part>
<epilogue>
</epilogue>
</multipart>
</body-part>
<epilogue>
</epilogue>
</multipart>
</message>
-------

And this seems wrong because the outerboundary and the outer epilouge should not be part of the preable.

So there must be something more to fix.
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Oleg Kalnichevski (JIRA)
2008-07-14 11:39:32 UTC
Permalink
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613295#action_12613295 ]

Oleg Kalnichevski commented on MIME4J-52:
-----------------------------------------

I'll look into fixing the infinite loop. Unfortunately I do not see an easy way of fixing the way corrupt multipart entities are parsed. The parser in its present form tracks only one boundary at a time. It can't resume looking for the outer boundary until the inner one is properly terminated. This approach has its advantages and disadvantages: on the plus side it is more efficient both in terms performance and the memory footprint. Also, the parser can handle correctly inner body parts that may accidentally contain the outer boundary. On the negative side, malformed multipart entities mess the rest of the message up completely.

Oleg
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Stefano Bagnara
2008-07-14 11:58:16 UTC
Permalink
Post by Oleg Kalnichevski (JIRA)
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613295#action_12613295 ]
-----------------------------------------
I'll look into fixing the infinite loop. Unfortunately I do not see an easy way of fixing the way corrupt multipart entities are parsed. The parser in its present form tracks only one boundary at a time. It can't resume looking for the outer boundary until the inner one is properly terminated. This approach has its advantages and disadvantages: on the plus side it is more efficient both in terms performance and the memory footprint. Also, the parser can handle correctly inner body parts that may accidentally contain the outer boundary. On the negative side, malformed multipart entities mess the rest of the message up completely.
Oleg
The < to <= change already fix the infinite loop.
Whether we want to keep the current behaviour or revamp the old
behaviour is something we have to discuss in the list.

I think Niklas and other old-time mime4j users can help us with this
dilemma.

In the mean time I'm preparing new messages for the test suite so we
have something concrete to discuss.

Stefano
Stefano Bagnara
2008-07-14 12:19:08 UTC
Permalink
Post by Stefano Bagnara
Post by Oleg Kalnichevski (JIRA)
[
https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613295#action_12613295
]
-----------------------------------------
I'll look into fixing the infinite loop. Unfortunately I do not see an
easy way of fixing the way corrupt multipart entities are parsed. The
parser in its present form tracks only one boundary at a time. It
can't resume looking for the outer boundary until the inner one is
properly terminated. This approach has its advantages and
disadvantages: on the plus side it is more efficient both in terms
performance and the memory footprint. Also, the parser can handle
correctly inner body parts that may accidentally contain the outer
boundary. On the negative side, malformed multipart entities mess the
rest of the message up completely.
Oleg
The < to <= change already fix the infinite loop.
Whether we want to keep the current behaviour or revamp the old
behaviour is something we have to discuss in the list.
I think Niklas and other old-time mime4j users can help us with this
dilemma.
Here is the rfc text:
-------------
5.1.2. Handling Nested Messages and Multiparts

The "message/rfc822" subtype defined in a subsequent section of this
document has no terminating condition other than running out of data.
Similarly, an improperly truncated "multipart" entity may not have
any terminating boundary marker, and can turn up operationally due to
mail system malfunctions.

It is essential that such entities be handled correctly when they are
themselves imbedded inside of another "multipart" structure. MIME
implementations are therefore required to recognize outer level
boundary markers at ANY level of inner nesting. It is not sufficient
to only check for the next expected marker or other terminating
condition.
---------------
In particular the last sentence:

It is not sufficient to only check for the next expected marker or other
terminating condition.

It seems we'll have to think twice at this.

Stefano
Stefano Bagnara (JIRA)
2008-07-14 12:03:32 UTC
Permalink
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613297#action_12613297 ]

Stefano Bagnara commented on MIME4J-52:
---------------------------------------

I just committed your file as missing-inner-boundary.msg
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Stefano Bagnara (JIRA)
2008-07-14 18:06:31 UTC
Permalink
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613389#action_12613389 ]

Stefano Bagnara commented on MIME4J-52:
---------------------------------------

I think I found a solution by using nested "MimeBoundaryInputStream-InputBuffer" tuples instead of recreating MimeBoundaryInputStream always from the root buffer.

This also involve many other changes (remove the use of InputStream+InputBuffer in favor of the single InputBuffer), so I will cleanup things and create a branch to show the solution to allow outer boundary to keep checking the stream on higher priority.
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Oleg Kalnichevski (JIRA)
2008-07-14 18:30:32 UTC
Permalink
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613403#action_12613403 ]

Oleg Kalnichevski commented on MIME4J-52:
-----------------------------------------

Yes, that should do it. I also thought about the problem and found it not as bad as I initially had felt.

Oleg
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Niklas Therning (JIRA)
2008-07-14 20:08:31 UTC
Permalink
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613434#action_12613434 ]

Niklas Therning commented on MIME4J-52:
---------------------------------------

Stefano, wouldn't that mean that we get some double buffering of the data?
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Stefano Bagnara
2008-07-14 22:31:52 UTC
Permalink
Post by Niklas Therning (JIRA)
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613434#action_12613434 ]
---------------------------------------
Stefano, wouldn't that mean that we get some double buffering of the data?
Yes, this introduces new buffering levels for each mime layer.
This could be improved later: ATM I would like to see all the bugs we
found closed and to have a 0.4 release, then it should be easy to
optimize this case.

Stefano
Post by Niklas Therning (JIRA)
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
Stefano Bagnara (JIRA)
2008-07-15 10:13:32 UTC
Permalink
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefano Bagnara reassigned MIME4J-52:
-------------------------------------

Assignee: Stefano Bagnara
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Assignee: Stefano Bagnara
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Stefano Bagnara (JIRA)
2008-07-18 09:04:31 UTC
Permalink
[ https://issues.apache.org/jira/browse/MIME4J-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefano Bagnara resolved MIME4J-52.
-----------------------------------

Resolution: Fixed

Branch has been merged.
Post by Niklas Therning (JIRA)
Infinite loop when nested multipart is missing end boundary
-----------------------------------------------------------
Key: MIME4J-52
URL: https://issues.apache.org/jira/browse/MIME4J-52
Project: Mime4j
Issue Type: Bug
Affects Versions: 0.4
Reporter: Niklas Therning
Assignee: Stefano Bagnara
Fix For: 0.4
Attachments: 36387089_3.msg
I'm attaching a message which causes Mime4j to loop indefinitely in non-strict mode. The message contains badly formatted MIME. The inner multipart has no end boundary. Mime4j 0.3 handled this situation without any problems. The inner multipart would stop at --outer boundary-- and the AAA would be part of the preamble for the inner multipart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Loading...